The Journal of 
Experimental Education 


A periodical report of scientific investigations relating to child development, 


Page 
A Simple Method for ing a Centroid Factor Matrix to a Simple 
Structure H - Paul Horst 251 
Occupational Expectations of Twelfth Grade Michigan Boys 
E. Grant Youmans 259 
A Brief Discussion of One of the Analyses in the Experiment—The Effect 
of Fractions Orville B, Aftreth 273 
A Follow-up of an Experimental Teaching Method in Music at the Fourth 
and Fifth Grade Levels Carl B. Nelson 283 


Does an Honor System Reduce Classroom Cheating er 
Cannin 


Rey 291 

The Comparability of the Simple Discriminant Function. and Multiple 
Regression Techniques William B. Michael and Norman C. Perry 297 

The Influence of Item Modality on the Dimension Measured by a Test 
Donald M. Medley 303 

Power of the Rank, Median and Run Tests When Ties are Numerous— 
Empirical Study Merle W. Tate 309 

Factor Analysis of High School Variables and Success in University Sub- 

jects for the First Semester in the University 

Elizabeth C. Baker and George A. Baker 315 

A Comparison of the Accuracy of the Formula for the Standard Error of 


Pearson “‘r”’ with the Accuracy of Fisher's z-Transformation 
Harvey F. Dingman and Norman C. Perry 319 


A Predictive Confidence Interval for the Validity Coefficient 
R. B. McHugh 323 


PUBLISHED QUARTERLY 


Published by Dember Publications, Inc., 
Madison 3, Wisconsin. 
Eatered as second-class matter October 17, 1938 at the post office at Madison, 
Wisconsin, under the act of March 3, 1879, 


Volume XXIV June 1956 Number 4 

CONTENTS 
$5.00 A YEAR ee $1.50 A COPY 


EDITORIAL BOARD 


A. 6. Barr, Chairman, Professor of Eéucation, University of Wisconsia, Madissn 6, Wis. 


Arther T. of T Col- 
“University, New. Yor 
sible measusements, 


University 
torially respon- 
each 


H. 
KH, of 
ony Indiana. Editerially responsible for 
each September. 


CONTRIBUTING EDITORS 


oy tT. ot Psychology, 
“Bitte College of Weshington, Pullman, 


Conculting Payehologist, Halitex, 
Le 


©._W. Professor of Usiversity of itt 


of Education, University of 


Louis G. Schmidt, Assi 


Harold Seashore, Peychologi- 
cal Corporation, New Le" 


University of Alabama, Alsbame. 
"York 09 Kast Mew 
"Teachers College, Columbus University, New York 

= — Psychology, Obie State 
MGyracuse University, 10, New York. 
University, ew You 


Woot, Profesor of Eéucation, Mew York Uni- 


New 110 Li ya, 


Journal of Experimental Education 


Volume XXIV 


June, 1956 


Number 4 


A SIMPLE METHOD FOR ROTATING A CEN- 
TROID FACTOR MATRIX TO A SIMPLE 
STRUCTURE HYPOTHESIS’ 


PAUL HORST 
University of Washington 


PERHAPS THE best known and most com- 
monly used technique of factor analysis is the 
centroid method developed by Thurstone (6). A 
method which is well known but computationally 
much more laborious is the principal axis tech- 
nique developed by Hotelling (4). However, this 
latter method has now been adapted to high 
speed electronic computors, so that the compu- 
tational costs are not excessive. But whether 
the centroid, the principal axis, or some other 
method is employed there remains the problem 
of rotation. Thurstone (6) has developed the 
concept of simple structure and emphasized the 
importance of transforming or rovating the ar - 
bitrary factor matrix to simple structure form. 
Perhaps the most commonly used methods of ro- 
tation are the graphical methods of Thurstone 
(6), which may be either oblique or orthogonal. 
A simplified mechanical method of orthogonal 
rotation has also been developed by Zim mer- 
man (8). Non-graphical methods of rotation 
have been developed by Carroll (1), Horst (3), 
Thurstone (6) and Tucker (7). 

As is well known the graphical methods of 
rotation are at best very time consuming. Fur- 
thermore, they involve a great deal of subjec- 
tivity in the way the rotations are accomplished. 
The non-graphical methods, on the other hand, 
are more routine and objective but they also in- 
volve a great deal of computational labor. 

In general, both the graphical andnon-graph- 
ical methods of rotation involve no hypotheses 
about the simple structure factor composition 
of the test battery. They are designed to find 
simple structure if it exists. However, Thur- 
stone (6) has repeatedly emphasized the import- 
ance of assembling a battery of tests for factor 
analysis on the basis of some a priori hypothe- 
sis as to their factor structure. Guttman (2) al- 
so suggests that the result of a factor analysis 
carries more scientific weight if it is designed 
to test an apriori theory rather than used as the 
basis for constructing an a posteriori theory. 


In any case, suppose an apriori hypothesis is 
available with respect to the simple structure 
factor loadings of a battery, or even that a tenta- 
tive theory may be deduced by some form of 
cluster analysis of the correlation matrix. Then 
if either a centroid or principal axis factor ma- 
trix has been computed a relatively simple meth- 
od will be presented Lere for transforming the 
factor matrix to the hypothesized simple struc- 
ture matrix. The method places no restrictions 
on the number of simple structure factors which 
may contribute significantly to the variance of a 
test. A test may have appreciable loadings on 
more than one factor. The method to be outlined 
is an approximation to a least square solution if 
the centroid matrix is used. It is precisely a 
least square solution if the principal axis matrix 
is used. It is much more likely, however, that 
the centroid matrix rather than the principal ax- 
is matrix will be available. 

The method will be illustrated with an exam- 
ple taken from Thurstone and Thurstone (5). The 
first 7 columns of a centroid matrix for 21 vari- 
ables are reproduced in Table I. The rotated 
factor matrix obtained by Thur stone is repro- 
duced in Table I]. To illustrate the method we 
take as our hypothesis of simple structure the re- 
sults obtained in Table Il. We arbitrarily desig- 
nate a loading of . 30 or more as non zero anda 
loading of less than . 30 as near zero. This gives 
us the factor pattern hypothesis indicated by the 
underlined values in Table Il. These entriesare 
also underlined in Table 1. Having given the hy- 
pothesized factor pattern and the centroid (or 
principal axis) matrix the computational steps 
are as follows: 


1. Calculate the row sums for the centroid 
matrix in Table I and enter these sums in the » 
column at the right. 

2. Calculate the column sums of Table land 
enter these sums in the © column at the bottom 
of the table. 


*This research was supported in part by Research Grant “-71,3 “4i (1) from the Public Health Service, Na- 


tional Institutes of Health. 


JOURNAL OF EXPERIMENTAL EDUCATION 


1.22 
1.33 .5993 
5403 


11.93 «le 12.38 12.8624 
12.3% 


6.9883 1.8103 1.2138 9722 «7655 6197 +4926 12.86% 
1431-5524 «8238 1.0296 1.3063 1.6137 2.0300 


252 ee (Vol. 24 
TAHLE I 
19 
a. 


June, 1956) 


1 
2 
3 
4 
5 
6 
7 
8 
9 


HORST 253 
TABLE IT 


254 JOURNAL OF EXPERIMENTAL EDUCATION 


3. Sum the entries in the». column of Table 
I and enter this sum in the space immediately 
below. This should be 12. 38. 

4. Sum the entries in the 2 rowof Tablel 
and see that this sum is identica) with the value 
calculated in step 3. 

5. Enter the row sums of squares for Table 
I in the h* column at the extreme right of Table 
I. Note that these are the test communaliti es 
that usually are readily available. These en - 
tries should be equal to one minus the corres- 
ponding diagonal elements of the residual cor- 
relation matrix. 

6. Sum the entries in the h* column of Table 
I and enter this sum in the space im mediately 
below the column. 

7. Calculate the sums of squares of elements 
for each column of Table I and enter in line Dat 
the bottom of the table. The first entry in the 
example is 6.9883. (Note that if the principal 
axis method has been used these values are al- 
ready available in the form of the variance ac- 
counted for by each factor loading. ) 

8. Sum the entries obtained in step 7 to see 
that they are equal to the sum of the communal- 
* jtles on the right. This total is 12. 8624. 

9. Consider all those tests whose loadings 
are underlined in the first column of Table I. 
Get the sum of the elements in the firstcolumn 
of the rows corresponding to these tests and en- 
ter in Table II, row1, column 1. This is . 51 
+ .62 + .63 = 1.76. In the same way add ele- 
ments in the remaining columns of these rows 
to get the remaining elements of the first row 
of Table Ill. Include the © column. 

10. Sum the elements in the first row of Ta- 
ble Ill. See that this value is the same as the 
entry in the column. 

11. Repeat steps 9 and 10 for the rows of Ta- 
ble I corresponding to the entries underlined in 
the second column of Table I to get the second 
row of Table II. 

12. Using the rows indicated by the under- 
lined entries in the remaining columns of Table 
I calculate and check the entries in the remain- 
ing rows of Table III. 

13. Calculate column sums for Table III and 
enter in the © row at the bottom of the table. 

14. Sum all elements but the last in the 2 
row calculated in step 13 and see that this sum 
is the same as the last entry in the row. This 
is 12, 00. 

15. Calculate the reciprocals of the entries 
in line D at the bottom of Table I and enter im- 
mediately below in line D~'. 

16. Multiply each entry in row D~' of Table 
I by the corresponding entry in row D to see 
that the product is unity within rounding error. 

17. Multiply each entry in column 1 of Ta- 
ble II by the first entry in row D~' of Table I 
and enter the products in the first column of Ta- 


(Vol. 24 


ble IV. Include the entry from the = row. The 
first entry in this column is 1.76 x .1431 = .252. 

18. Sum all but the last of the entries in col- 
umn 1 of Table IV and enter this sum immediate- 
ly below the last entry in rowC. This should be 
the same within rounding error as the entry im- 
mediately above in the row. In the example 
these numbers are both 1. 780. 

19. Calculate and check the entries in column 
2 of Table IV by using the second entry in rowD~ 
of Table I and column 2 of Table III. 

20. Calculate the remaining columns of Table 
IV by using entries from the appropriate columns 
of Table III and row D~' of Table I. 

21. Calculate the sum of squares of the entries 
in the first row of Table IV and enter in the first 
row of column A immediately to the right. This 
is 2.720. 

22. Calculate the sums of squares of row en- 
tries for the remaining rows of Table IV and en- 
ter in column A. 

23. Calculate the square roots of the entries 
in column A of Table IV and enter in column B 
immediately to the right. The first entry in col- 
umn B is v 2. = 1. 6492. 


24. Calculate the reciprocals of the entries in 
column B of Table IV and enter in column C im- 
mediately to the right. The first entry incolumn 


C is = . 6063. 

25. Multiply each entry in column A of Table 
IV by the corresponding entry in column C. The 
product should be the same within rounding error 
as the corresponding entry in column B. For the 
first entries we have 2.720 x . 6063 = 1. 6491. 

26. Multiply each of those elements in the first 


- row of Table IV preceding column A by the first 


element in column C and enter the products in the 
first row of Table V. The first entry in this row 
is . 252 x .6063 = .153. 

27. Get the sum of squares of the entries in 
the first row of Table V and enter it immediate- 
ly to the right in column C. This should give 
unity within limits of rounding error. 

28. Calculate and check the remaining rows 
of Table V from the corresponding rows of Table 
IV as indicated in steps 26 and 27. 

29. Calculate the sum of products of corres- 
ponding entries from the first row of TableI and 
the first row of Table V. Include only the first 
seven entries from each table. Enter this sum 
of products in the first row and column of Table 
VI. This is (. 51) « (. 153) + (-. 27) « (-. 224) + 
(-. 26) x (~. 120) + (. 11) « (. 050) + (-.13) x(-.301) 
+ (-. 09) x (. 108) + (-. 25) x (-. 899) = . 429. 

30. Calculate the sum of products of corres- 
ponding entries from the second row of Table I 
and the first row of Table V and enter this sum 
in the second row of the first column of Table VI. 

31. Calculate the remaining entries in the 
first column of Table VI as in steps 29 and 30 


SS6*TS9° 
LL9L9%5 * 


ERA 


June, 1956) HORST . 255 
R 
zig 
8R2SRRB 
o {ERS RE SB 
#« 
3 7 
i 
RBZ 
|§ 
& 
4, RRR aA 
~| 
4A 9435 4 
8 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


256 
TABLE VI 
1 2 3 + 5 6 7 

354 080 .033 -.023 -.035 
2 457 .O5L -.052 .030 
3 .072 197 =007 .009 8 
5 190 -.065 113 
6 =.062 -.056 =-.064 =.062 
7 022 -.099 080 
8 .000 1% =0187  -.016 
9 -.012 125 .633 -.003 *.09L 
ll .020 -.036 .580 .039 =-.05% 
12 .0% .230 126 -.118 
le -.019 =.155 -.038 
15 -.071 -112 020 
19 .143 -.068 -.022 .065 .097 
£ 1.688 2.2% 1.6872 1.959 16343 1.342 2.232 
C 1.687 2.273 1.672 1.959 2.232 


June, 1956) 


by using the first row of Table V and the remain- 
ing rows of Table I through the > row. A con- 
venient aid in making these computations is to 
fold Table V immediately above row 1 and place 
row 1 immediately below the row of TableI for 
which the corresponding element of column 1 in 
Table VI is to be computed. In this way one of 
the factors to be multiplied lies directly above 
the other. 

32. Sum the entries in columr 1 of Table VI 
down to but not including the © row. Enter this 
sum in column 1 of the C row. This number 
should be the same within limits of rounding er- 
ror as the one immediately above it. In the ex- 
ample, these two numbers are 1. 688 and 1.687. 

33. Calculate and check the entries in the 
second column of Table Vilas in steps 30, 31, 
and 32 by using the second row of Table VI and 
the appropriate rows of Table I. To facilitate 
the computations, fold Table V between rows 1 
and 2 and place row 2 of Table V immediately 
below the row of Table I for which the corres- 
ponding entry of column 2 in Table VI is being 
computed. 

34. Calculate and check each remaining col- 
umn of Table VI by using the corresponding row 
of Table V and the appropriate rows of Table I. 
To facilitate the computations fold Table V just 
above the row to be used and place it immediate- 
ly below the row of Table I for which the corres- 
ponding entry in the column of Table VI is being 
computed. 


Table VI is the matrix of rotated factor load- 
ings. It may be seen that these compare close- 
ly with those obtained by Thurstone in Table II. 
Obviously, of course, the illustration we have 
given has taken unfair advantage of Thurstone’s 
results. 


MATHEMATICAL APPENDIX 


The mathematical basis of the method out - 
lined is as follows: 


Let a be ann ~* s arbitrary factor loading ma- 
trix of n tests and s factors. 

{ be a simple structure hypothesis matrix 
of the same order asa. The ij’th ele- 
ment of f is zero if a near zero factor 
loading is assumed for that position, 
and unity ifan appreciable loading is 
assumed. 

b is the simple structure factor matrix. 

H is a matrix which transforms a to b. 

b is a diagonal matrix of order s to be de- 
termined as in (4) below. 


HORST 


To determine H we write 

aH - fo=C€ (1) 
The least square solution for (1) can be shown 
to be 

H = (a'a)~' a'fo (2) 


We determine 6 in (2) so that the columns of H 
are normalized. We write 
M = (a'a)~' a'f (3) 


We let Dy: y be a diagonal matrix of the diagon- 
al elements of M'M where M' is the transpose 
of M. (Also Dy: 4 is similarly defined for H. ) 
We write 


b= ? (4) 


Then from (2), (3) and (4) 
H = MDyiy? (5) 


From (5) 1 
Dy = Dy m DMM = 1 (6) 
waich shows that the column vectors of H are of 


unit length. 


Now if a is a principal axis matrix a'‘a is di- 
agonal and the computations indicated in the ex- 
ample can be shown to be precisely the least 
square solution for H in equation (1). Ifaisa 
centroid matrix then in most cases a'a is very 
nearly diagonal. We, therefore, take as an ap- 
proximation to a'a the diagonal of the product 
matrix and indicate it by Dg: q- 

We have then as an approximation to the solu- 
tion for H when a is a centroid matrix 

H= Dj/, (7) 


Let us now compare the computational proced- 
ure outlined above with equation (7). 

The first 21 rows and 7 columns of Table 
I constitute the matrix a. 

The D row of Table I consists of the ele- 
ments of Da: a. 

The D™' row of Table I consists of the ele- 
ments of Dj? 

Table II is f'a written for convenience as 
the transpose of a'f. 

The first 7 rows and columns of Table IV 
constitute M' = f' 

The A, B, and C columns of Table IV con- 
sist respectively of the elements of 


and Dy or 6 . 
Table V is H' = Dm’ Mm"? M' and is written 
for computational convenience as the transpose 
of H. 


Table VI is the simple structure factor load- 
ing matrix b given by b = aH. 


JOURNAL OF EXPERIMENTAL EDUCATION 


REFERENCES 


1. Carroll, J. B. ‘An Analytical Solution for 
Approximating Simple Structure in Factor 
Analysis,’’ Psychometrika, XVIII (1953), 
pp. 23-38. 

2. Guttman, L. ‘‘Multipie Group Methods for 
Common-Factor Analysis: Their Basis, 
Computation, and Interpretation, '’ Psycho- 
metrika, XVII (1952), pp. 209-222. 

3. Horst, P. ‘‘A Non-Graphical Method for 
Transforming an Arbitrary Factor Matrix 
into a Simple Structure Matrix, ’’ P s ycho- 
metrika, VI (1941), pp. 79-99. 

4. Hotelling, H. ‘‘Analysis of a Complex of 
Statistical Variables into Principal Compon- 


(Vol. 24 


ents,’’ Journal of Educational P 8 y chology, 
XXIV, pp. - - 

5. Thurstone, L. L., and T. G. Factorial 
Studies of Intelligence, Psychometric Mon- 
ograph No. 2 (Chicage: University of C hi- 
cago Press, 1941), p. 91. 

6. Thurstone, L. L. Multiple Factor Analysis 
(Chicago: University of Chicago Press, 
1947), Chs. VIII, IX, X. 

7. Tucker, L. R. ‘‘A Semi-Analytical Method 


of Factorial Rotation to Simple Structure,’’ 
Psychometrika, IX (1944), pp. 43-68. 


8. Zimmerman, W. S. ‘‘A Simple Graphical 
Method for Orthogonal Rotation, ’’ Psycho- 
metrika, XI (1946), pp. 51-55. 


OCCUPATIONAL EXPECTATIONS OF TWELFTH 
GRADE MICHIGAN BOYS 


E. GRANT YOUMANS 
Arlington, Virginia* 


A TRADITIONAL and flourishing belief exists 
in the United States that any person, regardless 
of origin, can become whatever he wishes if only 
he will make the effort. This belief in unlimit- 
ed vertical mobility has been challenged recent- 
ly by a number of scientific investigations. Stud- 
ies in social class and in other forms of social 
stratification offer evidence that opportunities 
and achievements in American life are severely 
limited by socia! origin.!** 

This paper examines the occupational expec - 
tations of twelfth grade Michigan boys in terms 
of social origins. In doing so it aims to offer 
some refinement to the theory of occupational 
choice. Ginzberg has proposed that occupation- 
al choice is a continuing process which has 
distinct phases: (1) fantasy choices, (2) tenta- 
tive choices, and (3) realistic choices. The 
significant question is: What factors are in op- 
eration as an individual moves from fantasy to 
realism in his choices? Obviously many factors 
operate. This paper focuses on some of the so- 
cial factors. 

It is assumed that social origin, the home, 
the school, work experience, and type of com- 
munity are social factors which influence 
choices in occupations by young people. Do 
these broad factors contribute to realism in 
choice? Which factors are more important in 
occupational choice? To what degree do these 
social factors influence occupational choice ? 
The following hypothesis is investigated in this 
paper: Position in the social structure, that is, 
social origin, is more important in formulating 
the occupational expectations oi youth than are 
such factors as the home, the school, work ex- 
perience, and type of community. 


Methods 


To test this hypothesis, use is made of the 
data collected by the Social Research Service of 
Michigan State College. The Michigan Bell Tele- 
phone Company granted funds to cover the oper- 
ating costs of an informational survey of the 
work interests and attitudes of Michigan youth. 
Dr. Charles P. Loomis, Director of the Ser - 
vice, appointed a committee of faculty mem- 
bers to carry out the project and publish a re- 
port.3 The committee designed a sefl-adminis- 


* 1632 - 26 Road South 
*#Al) footnotes will be found at end of article. 


tering questionnaire of eighty items which was 
filled out by a representative sample of 6789 
tenth and twelfth grade youths from 56 public 
and private high schools in Michigan. The com. 
pleted questionnaires were coded and the data 
tabulated on Hollerith cards. 

In the investigation of the impact of social 
stratification upon the occupational expectations 
of the twelfth grade boys, which is the task of 
this paper, the occupational level of the boys’ 
fathers is used as the index of social stratifica- 
tion. Three strata are used: (1) white collar 
workers, such as professional, managerial, and 
clerical workers, (2) manual workers, such as 
skilled workers and foremen, semi-skilled 
workers, and unskilled workers, and(3) farmers, 


' owners and tenants. The distribution of the 


twelfth grade boys in the sample is shown by oc - 
cupational level of father in Table I. Slightly 
over one-third of the boys in the sample are sons 
of white-collar workers, over one-hallare sons 
of manual workers, and about one-tenthare sons 
of farmers. Three times as many of the boys 
live in urban as rural communities. 4 

In order to analyze the data it was necessary 
to use statistical methods. The responses of 
the boys are analyzed from contingency tables .5 
The degree of association between variables is 
shown by the values of the corrected coefficients 
of contingency, computed by means of chi 
square. 


Social Stratification 


In the questionnaire the boys were asked to in- 
dicate the kinds of life work they would like to do 
and the kinds of life work they actually expected 
to do. Their occupational expectations are tabu- 
lated in Table Il. A very substantial and statis- 
tically significant association exists between 
social stratification and the occupational expec - 
tations of the boys, using the fathers’ occupa - 
tional levels as an index. 6 

In terms of expectations to achieve the high- 
er status occupations, the boys can be ranked in 
the following order: (1) sons of white-collar 
workers, (2) sons of manual workers, and (3) 
sons of farmers. Whereas 30.7 percent of the 
sons of white-collar workers expect to achieve 
professonial occupational status, only 20.6 per- 


JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


TABLE I 


DISTRIBUTION OF THE SAMPLE OF 1279 12TH GRADE MICHIGAN BOYS, BY OCCUPATIONAL 
LEVEL OF FATHER, AND BY RURAL-URBAN RESIDE NC E S* 


Occupational Level Percent Percent 
of Father No. Rural Urban 


White-collar worker 452 21.9 78.1 


76.2 
72.3 
88.2 


Professional 84 23. 
Managerial** 224 27. 
Clerical 144 a3. 
Manual worker 719 20. 80.0 
82.3 
77.1 
82.0 


Skilled 339 17. 
Semi-skilled 319 22. 
Unskilled*** 61 18. 


Farmer (owner and tenant) 108 69. 30. 5**** 


ao uo ono 


Total cases 1279 24. 75.2 


a The total sample drawn was 1456. Of these 102 indicated that their fathers were not living 
and 75 failed to respond. 

** Includes proprietors and officials but not farmers. 

*** Farm laborers, 3; servants, 14; other laborers, 44. 

**** Living in or adjacent to urban communities. 


260 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 


0° 
0° 
0° 
0° 
0° 
0° 
0° 
0° 
0° 
0° 
0 


gg 


re) 


on 


T7T 2° 


reany 
ureqin 


PSTITAS 


J@¥IOM 


atdureg [MOL 


oedxy 


jo 


SHOVING NI 40 TVNOLLVdN990 Ad TVNOLLYdN 990 


June, 1956) YOUMANS 261 
| 
& 
= 
23 
az 
| 
| | 
| 
wo 
\ 


262 JOURNAL OF EXPERIMENTAL EDUCATION 


cent of the sons of manual workers and 12.0 per- 
cent of the farm boys expect to achieve this stat- 
us. The same relative ranking of the boys from 
the three social strata maintains in terms of ex- 
pectations to achieve managerial and clerical 
occupations. In terms of all white-collar occu- 
pations, 56.6 percent of the sons of white-collar 
workers expect to achieve these occupations, 
while the comparable figures for the sons of 
manual workers and farmers are 35.6 and 21.4 
percent respectively. 

The relationship between occupational expec - 
tations and social stratification is even more 
pronounced when the sub-occupational strata are 
used as indices of stratification. Whereas 52.4 
percent of the sons of professional workers ex- 
pect to achieve professional status, only 26. 3 
percent of the sons of managerial workers and 
25.0 percent of the sons of clerical workers 
have this expectation. 

The responses of the twelfth grade boys of 
Michigan reveal a strong tendency for them to 
expect a job in the same occupational level as 
their fathers: There is a strong tendency for the 
farm boy to expect to become a farmer, for the 
son of the manual worker to expect to become a 
manual worker, and for the son of the white-col- 
lar worker to expect to become, inturn, a white- 
collar worker. The occupational expectations 
of the boys substantially and significantly re- 
flect their positions in the social structure, and 
this position is largely set by their fathers’ oc - 
cupational level. 

A comparison between the occupational aspir- 
ations ‘ and the occupational expectations of the 
boys reveals a ‘‘downward’’ adjustment. In Ta- 
ble Ill, the differences are shown between occu- 
pational aspiration and expectation, by occupa- 
tional level of father. In each social stratum 
the boys tend to aspire to jobs they do not actu- 
ally expect to achieve. They would like the hgh 
status jobs, such as professional, manager ial, 
ané skilled worker, but substantial numbers of 
they boys are realistic enough to recognize that 
they will not actually get them. This ‘‘down- 
ward’’ adjustment between aspiration and expec - 
tation is more pronounced among the sons of man- 
ual workers and farmers than among the sons of 
white-collar workers. For example, the excess 
of aspiration over expectation for professional, 
managerial, and skilled worker jobs is 22.6per- 
cent for the farm boys, 19.9 percent for the sons 
of manual workers, and 17. 4 percent for the sons 
of white-collar workers. The Michigan twelfth 
graders portray the traditional ‘‘upward mobil- 
ity’’ ideology existing in United States and, at 
the same time, they reveal a realistic under - 
standing of this ideology. 


The Community 


Numerous studies have concerned themselves 


(Vol. 24 


with differences between rural and urban people. 
It is thus expected that the occupational e xpec- 
tations of youth from rural areas would be signif- 
icantly different from the expectations of urban 
youth. 8 The responses shown in Table II indi- 
cate that the twelfth graders living inor adjacent 
to urban communities have slightly higher occu- 
pational expectations than the boys from rural 
areas in Michigan. Whereas 25.1 percent of ur- 
ban youth expect to become professional workers, 
only 14.5 percent of the rural boys have this ex- 
pectation. Whereas 43.7 percent of the urban 
boys expect to become white-collar workers, 
only 36.1 percent of the rural boys have this ex- 
pectation. 

The fact that the urban boys have ‘‘higher’’ 
occupational expectations than the rural boys is 
probably accounted for by two main fac tors: (1) 
the urban boys are probably more strongly indoc- 
trinated with the upward mobility ideology exist- 
ing in United States, and (2) there are more nu- 
merous Opportunities for the higher status white- 
collar jobs in the urban than in the rural com - 
munities. 


The Home Situation 


It is by no means easy to study the influence 
of the home on occupational choice because of 
the difficulty in eliminating other factors in op- 
eration. It is generally recognized that the home 
is the basic socialization agency for youth and 
as such would profoundly influence the ideas and 
expectations of young people. In this study this 
influence can be investigated only within the lim- 
its of the data. 

In the questionnaire the twelfth grade boys of 
Michigan were asked how much work they did at 
home, how much spending money they received, 
whether they received this allowance regularly 
or not, the size of their family, their father’s 
formal educational level, the working status of 
their mother, and their sibling position in the 
family. Analysis of the responses revealed no 
Statistically significant relationship between the 
boys’ occupational expectations and the amount 
of work they did at home, the amounts of spend- 
ing money they received, or whether they got 
this allowance regularly or not. Inother words, 
whether they boy did much or no work at home, 
whether he received a large allowance or none 
at all, or whether he received his allowance reg- 
ulariy or had to ask for it made no significant 
difference in his occupational expectations. 

The sibling position of the boy in the family 
was not statistically significant in occupational 
choice. It made no significant difference in 
terms of occupational expectations whether the 
youth was the oldest child, the youngest child, 
or the ‘‘in-between’’ child. 

As shown in Table IV, a very slight but sta- 
tistically significant relationship exists between 


80T 


suog suog suog suog suog suos 

jo suog yo suog yo suog Teuonedns99 
Tenueyw Tenuew Tenuew 


(2) (1) 
(z) snuyu (1) suontiidsy 


NI JO 
IVNOLLVdN990 Ad ‘SNOLLVLOGdXa TVNOLLVdN990 GNV SNOLLVUIdSV TVYNOLLVdN 990 


June, 1956) 


YOUMANS 263 | 
| 
| gs 


N 
3 


JOURNAL OF EXPERIMENTAL EDUCATION 


264 


0°00! 62 9°S 9°8 6 FI 6 9°¢ 10 


NI ‘ATINVd 4O Ad ‘SNOLLVLOGdX TVNOLLVdN990 


A 
0°00! 9°ss o's! zit Aed 10} 
16! 8st ze bel hed 103 


suoneoedxg 


NI JO SQLYLS ONDIHOM AG ‘SNOLLVLOGdXa TVYNOLLVdN 390 
Al Z1GVL 


June, 1956) 


the working status of the mother and the occu- 
pational expectations of the son.9 The sons of 
non-working mothers reported slightly higher 
occupational expectations than sons of working 
mothers. Apparently experience in the work 
world tends to make working mothers more 
realistic about the possibility of achieving high 
occupational status in United States and this 
realism is probably transferred to the son. 

The size of the family the boy comes from 
and the formal educational level of the father 
are significantly related to the boys’ occupation- 
al expectations. 10 As shown in Table V, the 
smaller the size of the family the higher the oc- 
cupational expectations; the larger the family 
the lower the occupational expectations. As 
shown in Table VI, the higher the formal ed- 
ucational level of the father, the higher the 
son’s occupational expectations; the lower the 
formal educational level of the father, the low- 
er the son’s occupational expectations. For ex- 
ample, whereas about 60.0 percent of the sons 
of fathers with college education expect to be- 
come managerial or professional workers, only 
about 30.0 percent of the sons of fathers with 
grade school education or less have this expec - 
tation. 

It is very doubtful that the relationship be - 
tween occupational expectation and family size 
and father’s formal educational level can be at- 
tributed to the home influence per se. The fam - 
ily of the boy occupies a position in the social 
structure and, as shown above, a very substan- 
tial relation exists between occupational strati- 
fication and the youths’ occupational expecta - 
tions. The interpretation is that family size 
and father’s formal education operate as status 
variables. The families of small size and the 
families whose heads have considerable formal 
education probably enjoy higher social status by 
virtue of these characteristics, and this social 
status is reflected in the boys’ occupational ex- 
pectations. Briefly, family size and education 
of father are probably status variables which 
bear a relationship to the boys’ occupational ex- 
pectations similar to the index of occupational 
stratification. 


The School 


Outside of the home probably the most im - 
portant institution for socializing young people 
is the school. Since the school is dedicated to 
the task of changing the behavior of young 
people, it is expected that the school will 
change youths’ occupational expectations. To 
test this, data were tabulated to compare the 
occupational expectations of boys in different 
curricula in the school. A low but statistical- 
ly significant relationship was found.!! As 
shown in Table VII, the boys from the white- 


YOUMANS 265 


collar social stratum who enroll in the academic 
curriculum have slightly higher occupational ex- 
pectations than the boys from the same social 
stratum who enroll in the vocational curriculum, 
Similarly, boys from the manual worker stratum 
in the academic curriculum have slightly higher 
occupational expectations than boys from this 
social stratum in the vocational curriculum. 
Thus it appears that the occupational expec ta- 
tions of youths from the same social stratum are 
different by virtue of the curriculum in which 
they are enrolled. The observed differences in 
occupational expectations of the seniors in the 
study probably are produced in two main ways: 
(1) The course content of the academic curricu- 
lum may influence the boys to expect the higher 
status white-collar jobs, while the course con- 
tent of the vocational curriculum may influence 
them to expect lower status manual jobs, and (2) 
the informal associations among the students in 
the school may influence the expectations. Sons 
of white-collar workers typically enroll in the 
academic curriculum and tend to expect white- 
collar jobs. These youth may influence the sons 
of manual workers who enroll in the academic 
curriculum to likewise expect white-collar jobs. 
Conversely, sons of manual workers tend to en- 
roll in the vocational curricula and expect manu- 
al worker occupations. These youth may influ- 
ence the sons of white-collar workers who enroll 
in the vocational curricula to likewise expect 
manual worker jobs. 


Work Experience 


It is a common belief in United States that 
work experience is desirable for young people. 
Employers place great emphasis on the amount 
and kind of work experience applicants for jobs 
possess. The underlying assumption is that work 
experience produces desirable changes in the be- 
havior of young people. While it is impossible 
to test the ‘‘desirableness’’ of work experience 
in this paper, it is possible to assess the amount 
of change which takes place. Table VIII shows 
the relationship between the amount of time spent 
on full time ~— and the boys’ occupational e x - 
pectations.!2 The responses of the boys support 
the following generalization: the less the work 
experience, the higher the occupational expecta- 
tions; conversely, the more the work experience, 
the lower the occupational expectations. Appar- 
ently, work experience produces a more ‘‘real- 
istic’’ view of occupational expectations. Those 
boys with considerable work experience have | 
grappled with the realities of full-time employ- 
ment; they have gained insights and understand- 
ings denied the inexperienced boys; and they con- 
sequently have a more realistic basis against 
which to measure their occupational expecta - 


tions. Work experience tends to bring a down- 


JOURNAL OF EXPERIMENTAL EDUCATION 


No 
Response 


Profes- 
sional 


Mana- 
-gerial 


Clerical 
Worker 


Occupational Expectations 
Skilled 
Worker 


Unskilled Semi-skilled 
Worker 


Worker 


Farmer 


: 


266 (Vol. 24 
coo eee cece 
233 aaa 
aco 
288 
: ower “coo 
deg 
eo: eee 
$435 Sac 
ann aan 
noe 
$55 
i 
il} 
S98 
Sie 


suoneoedxg 


NI T1N4 NO AG ‘SNOLLVLOGdXE TVYNOLLVdN990 
FIGVL 


TL 
‘oT 


9°9 


suonroedxg 


NI TIOUNT HOIKM NI WA AG TWNOLLVdN 990 
TA 


i] 
fi] a 
Hi] 2: 
i | 
g| 
53 i i 


Tut TL 


zat Ltt 4 


XI 


Zz 
< 


268 ee (Vol. 24 
38 
a3 3 35 
33 


TABLE X 


SUMMARY OF FACTORS ASSOCIATED WITH THE OCCUPATIONAL EXPECTA- 
TIONS OF YOUTH, SHOWING THE CORRECTED 
COEFFICIENTS OF CONTINGENCY 


Corrected Coefficient 
Factor of Contingency 


Social Stratification: 
Occupational level of father 0. 60 


Community: 
Rural-urban 0.33 


Home Situation: 
Work done at home 
Amount of allowance 
Type of allowance 
Sibling position 
Working status of mother 
Educational level of father 
Size of family 


School: 
Curriculum 


Work Experience: 
Number of full time jobs held 
Time on full time jobs 
Kinds of full time jobs held 


June, 1956) YOUMANS 269 

0.00 

0.00 

0.00 

0.00 

0.15 

0. 38 

0.20 

0.35 

0.18 

0.14 

0.27 


270 JOURNAL OF EXPERIMENTAL EDUCATION 


ward adjustment in occupational expectations. 

As shown in Table IX, the occupational e x - 
pectations of the seniors in this study are also 
significantly related to the kind of work experi - 
ence they have had.!13 The boys who had held 
only white-collar jobs have slightly higher occu- 
pational expectations than the boys who held 
only manual worker jobs. This generalization 
applies to the sons of white-collar workers and 
to the sons of manual workers. Apparently the 
youth tend to assimilate the values of the adults 
with whom they work: whether the youths are 
from the white-collar stratum or from the man- 
ual worker stratum, the boys tend to identify 
themselves with the ideology of the workers with 
whom they associate. 


Conclusions 


This study has investigated the occupational 
expectations of a representative sample of 
twelfth grade Michigan boys. It has contributed 
to the theory of occupationai choice by revealing 
in quantitative terms some of the social factors 
operating in the occupational choices of youth. 
Within the limits of the data, the study confirms 
the hypothesis that social stratification is more 
important in the formulation of youths’ occupa- 
tional choices than are the type of com munity, 
the school, work experience, or certain factors 
in the home situation. Table X summarizes the 
degrees of relationship between the factors an- 
alyzed and the occupational expectations of the 
boys in the study. 

It may be of importance to educational and 
vocational counselors to recognize the signif- 
icance of social structure in occupational choice. 
The theory of occupational choice can be fur- 
ther refined by studies delineating the relative 
importance of other factors operating in the oc- 
cupational choices of young Americans. 


FOOTNOTES 


1. W. Lloyd Warner, Marcia Meeker, and 
Kenneth Eells. Social Class in America 
(Chicago: Science Research Associates, 
Inc., 1949); W. Lloyd Warner, Robert J. 
Havighurst, and Martin B. Loeb. Who 
Shall Be Educated? (New York: Harper 
and Brothers, 1944); F. W. Taussig and 
C. 8. Joslyn. American Business Lead- 
ers (New York: Macmillan Co., 1937); 
Percy C. Davidson and H. Dewey Ander- 
son. Occupational Mobility in an Ameri- 
can Community (Stanford, Calif.: Stan - 
ford University Press, 1937). 

2. Eli Ginzberg. ‘‘Towards a Theory of Occu- 
pational Choice, '’ Occupations, XXX, No. 


(Vol. 24 


7 (1952), pp. 491-494. 

3. W. B. Brookover, and others. Youth and 
the World of Work (East Lansing, Mich.: 
SocialResearch Service, Michigan State 
College, 1949). 

. The questionnaire did not include a question 
on rural-urban residence. Rural-urban 
residence is determined by the location 
of the school the twelfth grader attended. 
Urban refers to communities of 2500 pop- 
ulation or more. 

. Inthis paper a portion of the contingency ta- 
bles are shown. For additional statisti- 
cal data, see E. Grant Youmans, An Ap- 
praisal of the Social Factors in the 
Attitudes and Interests of a Representa- 


tive Sample of Twelfth Grade Mic hi 
Boys , unpublished Ph.D. Thesis, Mi 


gan State College, East Lansing, Mich- 
gan, 1953). 

. The degree of association is evidenced by a 
corrected coetticient of contingency of 
0.60, significant above the .001 level of 
probability. 

. The boys occupational aspirations are signif - 
icantly and substantially related to social 
stratification. The corrected coefficient 
of contingency is 0.49, significant above 
the .001 level of probability. 

. The degree of relationship between r ural - 
urban residence and occupational expec - 
tation is evidenced by a corrected coef- 
ficient of contingency of 0.33, significant 
above the .001 level of probability. 

. The degree of relationship is evidenced by 
the corrected coefficient of contingency 
of 0.15, significant above the .05 level 
of probability. 

. The degree of relationship between family 
size and occupational expectation is evi- 
denced by the corrected coefficient of 
0.20, significant above the .05 level of 
probability; between formal education of 
father and occupational expectation by the 
corrected coefficient of contingency of 
0.38, significant above the .001 level of 
probability. 

. The degree of association between curricu- 
lum and occupational choice is indicated 
by a corrected coefficient of contingency 
of 0.35, significant above the .001 level 
of probability. 

. The relationship between time on jobs and 
occupational expectations is very slight. 
The corrected coefficient of contingency 
is 0.14, significant above the .05 level 
of probability. 

A similar relationship was found be - 
tween the number of jobs held and occu- 
pational expectations. The fewer the 


June, 1956) YOUMANS 271 


jobs held the higher the expectations; ficient of contingency of 0.18, significant 
the greater the number of jobs held above the . 05 level of probability. 
the lower the occupational expectations. 13. The degree of association is evidenced by a 
In this latter case, the degree of associ- corrected coefficient of 0.27, significant 
ation was indicated by a corrected coef- above the .01 level of probability. 


Be 


‘ 


A BRIEF DISCUSSION OF ONE OF THE ANAL- 
YSES IN THE EXPERIMENT: THE EFFECT OF 
THE SYSTEMATIC ANALYSIS OF ERRORS 
ON ACHIEVEMENT IN THE STUDY 
OF FRACTIONS’ 


ORVILLE B. AFTRETH 
Kenwood and Audubon Elementary Schools 
Minneapolis, Minnesota 


Introduction 


THE FOLLOWING discussion of the applica- 
tion of the analysis of variance and covariance 
(as a two-way classification with one dependent 
and three independent variables and with unequal 
numbers in the sub-groups) is given in order to 
aid experimenters desiring to make similar an- 
alyses. As the purpose of the article was to 
add meaning to this analysis, the outcomes of 
the entire study involving eight distinct analyses 
will not be emphasized. The implications for 
methodology are reviewed in Chapter V of the 
thesis now on file in the University of Minnesota 
Library, Minneapolis. 


The Problem 


The problem of the major investigation was 
to determine to what extent the identification 
and correction of errors embedded in sets of 
examples in addition and subtraction of fractions 
affected learning adversely. Pupils in the ex- 
perimental groups were required to identify and 
correct typical errors embedded in a series of 
nineteen worked-out sets of examples. The pu- 
pils in the control groups merely worked the 
same nineteen sets of examples as practice ex- 
ercises. These exercises paralleled closely 
the step-by-step development of these two pro- 
cesses in the basal textbook Arithmetic We Use, 
Grade 6, which was used in all classes. 

The following chart outlines the general de- 
sign of the total experiment: 


Testes Program and General Design of 
xperiment in ition of Fractions 
I. Preliminary testing of experimental and con- 
trol groups 
A. The Kuhiman-Finch Intelligence Test— 
Grade VI 


B. The Coordinated Scales of Attainment— 
Battery VI—Arithmetic Computation 


* Abstract of a Ph. D. dissertation completed at the 
##All footnotes will be found at end of article. 


C. Brueckner Comprehensive Test in the Ad- 
dition of Fractions—Test A 


Il. Developmental program 
A. Organization of groups 
1. All-control!** 
Group A 
Group B 


2. All-experimental2 
Group C 
Group D 


3. Split-control3 
Group E, 
Group F, 
Group G, 


4. Split-experimental4 
Group E, 
Group F, 
Group G, 


B. Differential treatment 
1. Control group (Group One) 


Solution of systematic lly selected ex- 
amples as a drill exercise; no errors 
present in examples (10 exercises) 


2. Experimental group (Group Two) 


Systematic discovery and correction 
of errors embedded in the identical 
worked out examples (10 exercises) 


Ill. Testing outcomes—all groups 
A. Immediate recall: Test A—repeated at the 
endof the developmental program 
B. Delayed recall: Test A—repeated three 
weeks after the end of the developmental 


program 


A similar design was utilized in the experiment 
University of Minnesota, December 1953. 


274 JOURNAL OF EXPERIMENTAL EDUCATION 


in subtraction of fractions. 

In the discussion of the procedures in the 
statistical analysis, only the results of the split- 
control and the split-experimental groups, with 
the post comprehensive test in addition of frac- 
tions as the criterion variables, will be util - 
ized in this paper. The complete dissertation 
of which this particular analysis is a part may 
be found in the University of Minnesota Library. 

The purpose of this particular analysis was 
to investigate the differences in achievement in 
addition of fractions, as shown by an immediate 
recall test, when the effects of intelligence, 
computation ability, and previous knowledge of 
addition of fractions are removed. The teacher 
factor is controlled by determining an independ- 
ent source of variation due to teachers and 
thereby removing this factor from the error 
source of variation. 

Table I presents the unadjusted mean values 
and the standard deviations for each variable in 
each group. The row means, the column means, 
and the grand means were computed for each 
variable. For the X variable (intelligence for 
pupils in the subtraction of fractions experiment) 
we note that the row mean for group E (the mean 
of the split-control and split-experimental sec - 
tions of group E) is 106. 250 while the row means 
for groups F and G are 103. 462 and 106.054 re- 
spectively. The column mean for the control 
group for the X variable (the mean of the split- 
control sections of all three classes) is 103.780 
while the column mean for the experimental 
group is 106.8612. The grand mean (the mean 
of the row means or the mean of the column 
means) is 105.215. Similar comparisons may 
be made for the standard deviations as we11 as 
for the variables Y, Z, and R. 

The comparisons noted by contrasting the ob- 
served gains made between the pretestsand im- 
mediate recall tests and between the pretest and 
delayed recall tests are given in Table II. 

Table II indicates that in group E the mean 
gains for the control section exceed the experi- 
mental section for all four criterion variables 
while in group G the mean gains for the exper - 
imental section exceed the control section for 
all four criterion variables. As these values 
are given for descriptive purposes, and as the 
observed differences may be due to chance er- 
rors as well as the effects of the covariables, 
it will be necessary as in analyses 1, 2, 3, and 
4 to apply more rigorous research tools. Thus 
the analyses of variance and covariance will en- 
able the experimenter to attribute the source of 
variation due to method, to teacher effect, and 
to interaction of teacher and method, and to the 
error effect, and at the same time partial out 
the effects of intelligence, computation ability, 
and previous knowledge of addition of fractions. 

As there was no attempt to present all of the 


(Vol. 24 


statistical operations, the steps in the analysis 
and the most important calculations are present- 
ed. The general model for this analysis (as 

well as analyses six, seven, and eight) is as fol- 
lows: 


Yijk = » + pi + + Bi - 
+ - X,) + Bs(Xsijx X5) + eij 
For this analysis having Z as the dependent vari- 


able and R, X and Y as the independent variables, 
we define 


Zijk + pi + Ty +8 + BilRije - R) 
BalXijx - + - Y) + Cijk 
For the two above equations 


Bo G (teacher) 
j = 1,....,n (method) 
k = (individual in a sub-group) 


The components then of a particular observation, 
the kth individual in the ith group with jth method 
(Zijk) is the sum of a general effect (u), a teacher 
effect (pj), a method effect (vj), and teacher- 
method interaction effect (E ij), the effect of com- 
putation ability 4,(Rijk - R), the effect of intelli- 
gence 62(Xijk - %), the effect of previous knowl- 
edge of addition of fractions 6,(Yjj - Y), andthe 
error effect (e\j). The @’s are weights assigned 
to the covariables R, X and Y which measure 
the amount of change in Z per unit change in the 
independent variable. 

The following procedure was followedalso for 
analyses 6, 7 and 8 in the complete investigation. 
This type of analysis is presented by Anderson 
and Bancroft (1: 281-283) with the statistical pro- 
cedures outlined by Moonan (no published form). 


Step One—The sums of squares and cross- 
products around the means of the sub-groups were 
calculated for all variables and all groups. The . 
within sub-classes sums of squares, i.e., the 
sub-groups sums of squares summed over all sub- 
groups for Z was 5, 934. 2626 as given in the anal- 
ysis of variance for Z, Table IIA. This repre- 
sents the error sums of squares. 

Step Two— The analysis of variance andcovar- 
iance tabl was set up for all variables and all 
possible combinations. The method, teacher, in- 
teraction of method and teacher, and the error 
variation for all variables and cross-products is 
thus obtained. The procedure used in working 
with one variable, Z, is given on the next page. 

The two by three table was constructed utiliz- 
ing the values corresponding to the number in 
each such group and the sum of the variable, Z, 
for all sub-groups. 


TABLE I 


MEAN AND STANDARD DEVIATION VALUES FOR THE VARIABLES X, Y, Z, R* FOR THE SPLIT GROUPS 
E, F, AND G** 


Experimental 


106. 250 
103. 462 
106. 034 


GM- 105. 215 


GM- 36.018 


GM- 17.714 
(Grand Mean) 


June, 1956) AFTRETH 275 
Control PF Row Mean 
Method 
Group Mean 8. D. Mean 8.D. Mean 8. D. 
x 
E 106. 000 8. 956 106. 563 9. 926 9. 387 
F 97.750 12. 798 109. 474 10. 689 11.771 
G 107. 789 13. 206 104. 222 15.031 14. 094 
Mean 103. 780 11. 627 106. 812 1.9393 OE GM-11.772 
(Grand Mean) 
Y 
E 17. 000 7.441 18.125 6. 652 17. 500 7.090 
F 13. 650 5.770 18.421 10. 200 15.974 7.928 
G 19. 579 10. 378 13. 611 7. 686 16. 676 9. 068 
Mean 16. 695 7. 820 16. 698 8. 275 GM- 16. 696 GM- 6.035 
(Grand Mean) 
E 37. 900 3. 697 38. 939 3.941 38. 362 3. 805 
F 33. 600 10. 369 37.211 5. 593 35. 359 8.042 
G 35. 895 5. 782 32. 889 11. 463 34. 432 8. 546 
Mean 35. 797 6. 630 36. 265 7. 088 Pe GM- 6. 647 
(Grand Mean) 
R 
' E 7. 600 3. 705 8.125 5.277 7.833 4.404 
. F 5. 250 3.754 8. 263 4.108 6.718 3. 926 
G 10. 631 6. 318 6. 556 3.434 8. 649 4.915 
Mean 7.779 4. 563 7. 642 4.232 P| GM- 6.847 


JOURNAL OF EXPERIMENTAL EDUCATION 


276 


*H 
padetaq pur pue padetaq pue pue 


UOnIPPY 


SdNOUD FHL GNV 
TOULNOOD-LITdS JHL SNOLLOVUA 40 NOLLOVULENS GNV NOLLIGCY NI SNIVD aaauasao 


(Vol. 24 


June, 1956) 

Method I 7 0 Total 

Teacher E 20( 758) 16( 623)  26(1391) 

Teacher F 20( 672) 19( 707)  39(1379) 

Teacher G 19( 682) 18( 592) 37(1274) 
Total 59(2112) 53(1922) 112(4034) 


The adjusted 5 total sum of squares of the 
columns (the method) is obtained by first setting 
up the following least squares equations: 
36p, + 207, + 167, = 1381 


39p, + 207, + 197, = 1379 


+ 


37p; 
20p, + 20p, + 19p, 


197, + 18%, = 1274 
597, = 2112 
+ 5S3¥, = 1922 


+ 


Using the formulas given in Anderson and 
Bancroft (1:282) we find 


ni, = 59 - + + = 27.87572189 


= =0 - (2006) , , 
= -27. 87572189 


nt, = 53 - ge" + G8)". 27. 87572189 


The adjusted C equations (1:283) become 


= -16. 617926 

cy = - , 10 
= 16. 617927 

(C, - C2) = ; (cy - = 


ch = £2 = .5961433 


The adjusted sums of squares for Z along the 
method line as the adjusted sums of squares of 


AFTRETH 277 


the columns becomes 
SSC (Adj) Ch = 9. 9066671 


Interaction SS Z = SS T (Unadj) - SS C (Adj) 
- SS A (Unad)j) 


SS A (Unadj) = (uy dua)” »... Sat 
(SS Z)* (758 + 623)* (682 + 592)* 


= 307. 6033 


SS C (Unad)j) = (ay + ay an) (iat 


= 6.1031 


SS T (Unadj) = (au), (os 


= 527.7016 


Substituting these three values above in the Inter- 
action SS equation we have 


Interaction SS Z = 527.7016 - 9. 9067 - 307, 6033 


= 210.1916 


TABLE IIA 


ANALYSIS OF VARIANCE FOR Z 


Degrees of 
Freedom SS Z 


Source of 
Variation 


Method (Adj) or 


SS C (Adj) 1 9. 9067 
Teacher (Adj) or 
SS A (Adj) 2 311. 4069 
Interaction of Teacher 
Method or Inter. SS 2 210. 1916 
Error 106 5, 934. 2626 
Total 111 6,461. 9642 


The SS A (Adj) was obtained by subtraction as fol - 


lows: 
SS (A + C) = SS A (Unadj) + SS C (Adj) 
SS A(Adj) = SS (A + C) - SSC (Unadj) = 311.407 


278 JOURNAL OF EXPERIMENTAL EDUCATION 


The entire procedure given above in step two 
was carried out for the variables, X, Y, FP, 7X, 
ZY, ZR, XY, XR, and YR. 

The results, summarized in Table III, indi- 
cate the method, teacher, interaction of meth- 
od and teacher, error and total variation for 
each sum of square and cross-product and also 
the regression coefficients which determine the 
proportionality of Z, (the post-test in addition 
of fractions), associated with X (intelligence), 
Y (previous knowledge of addition of { r actions), 
and R (arithmetic computation). 

Step Three—A multiple regression analysis 
was run for the error line of the anocovatable 
by first setting the normal equations in three 
unknowns along the error line in,order to obtain 
the b’s. The normal equations in three un- 
kaowns is set up from the following partial dif- 
ferentiation formula minimizing {, the sample 
estimate of the errors, with respect to the b’s: 


Of - - beXiy - by 


Ib, 


The corresponding adjusted® sums of squares 
along the error line may then be expressed in 
the following equation: 


Adj 88 Zw) = $3 Rij Zij - bg Zz Xij 
Zij - bs Yij Zij 


The procedure for setting up the multiple re- 
gression problem including forming the normal 
equations, and solving for the three unknowns, 
the b's, is given in the discussion of analysis 
number one. Employing the method of pivotal 
condensation to obtain the inverse matrix the 
values for the b’s thus obtained along the error 
line are: 


b, = -. 01207101 
b, = . 25196626 
b, = . 33795205 


If significant results are obtained in the analy- 
sis of variance and covariance, the significance 
of the b’s will be calculated. Substituting the 
values of the b’s and the values of the constants, 
from Table III in the formula above for the with- 
in adjusted sums of squares for Z, we have 


Adj 88 Zw) = 5,934.263 - (-.01207101)(1462.647) 
- ~ (.33795205)(3748.021) 

= 3270. 0502 
Four— Utilizing the method presented 


above in step three, the b’s associated with the 
interaction and error line are obtained in the 


(Vol. 24 


same manner as explained in step three by add- 
ing the values in the interaction and error line 
for each variable and cross-product. The values 
are found in Table III]. The b’s thus obtainedare 
as follows: 


b, = -.01844925 
b, = . 24710366 
by = . 34058628 


The values of the b’s and the values of the con- 
stants from the interaction and error line in 
Table If] are substituted in the following equation: 


(SS Ze + SS - = Adj. SS 
6144. 4550 - 2865. 1529 = 3279. 3021 


Then by subtracting the adjusted sums of squares 
for Z along the error line from the adjustedsums 
of squares for Z along the error and interaction 
line we have 


Adj SS Z; = Adj. SS Z,,; - Adj. SS Ze=9. 2519 


Five—The b’s, associated withthe teach- 
er and error line, are obtained by the same pro- 
cedure as explained in step four above, using the 
appropriate values along the teacher and error 
line. The b’s thus obtained are as follows: 


b, = -.07905180 
b, = . 25099284 
b; = . 37168611 


The values of the b’s and the values of the con- 
stants from the teacher and error line in Table 
Ill are substituted in the following equations: 


Adj. 88 = (SS Ze + SS - (BG)e,T 
= 6245. 6704 - 2745. 4230 = 3500. 2474 

Adj. SS Zz = Adj. SS Z,,7 - Adj. SS Ze 
= 3500. 2474 - 3270. 0502 = 230. 1972 


sep Six—The b’s associated with the method 
error line, are obtained by the same proced- 
ure as explained in step four above, using the 
appropriate value along the method and error 
line. The b’s thus obtained are as follows: 


b, = -. 01119560 
b, = . 25048673 
b, = . 33870939 


The values of the b’s and the values of the con- 
stants from the method and error line in Ta- 
ble II] are substituted in the following equa- 
tions: 


| "| 


279 


SOZS6LEE 9Z9961SZ- TOTLOZIO 8902 “BLLZ 9090 “Z8Iz g 
Q9EOTLEZ SZ6FFSIO 6E0E “EFL O6EE “OLE 26h (OY) 
LI9Z9TLE © O8TSO6L0 “28 8229 “OF S9L6 IOI (¥) 
*q ‘q AX dx AW dx XW dx 

98L9 “69LL TLS8 ILS8 1969 Itt TOL 
6020 “9T9S 9969 “E9TL “OSTST L6b8 9292 “PEGS 901 
w 668L° IS 6SLT°2 - 0€90° 992 L906 I 
= ZLOE 88 6S66 6. - 8869 1828 IL (¥) 
< AZ dx XZ dx ux dx Ass x ss uss zZss UOT 


(¥) 

puke JO UOTIIPpy Jo 
snoeid ‘(X) 
peTeysed saiqeise, juapuedapuy 


Aem-omL) (Z) JO UOTIPpy ut 


GNV ZONVINVA JO SSATYNY OISVa 


June, 1956) 


| 


JOURNAL OF EXPERIMENTAL EDUCATION 


260 


801 THIOL 
Z0S0 ‘OLZE col “SS 901 
000 "I> <d 000 "I> 1 L906 (3) Ponen 
$0°'>d so°<d 9860 “SIT z “SST (Vv) 
peisn(py paisn(pruy peonpey 10 paysn{py pajsn(peuy, 


pute JO UOTIPpy jo 
‘(X) 
INO PaTeyjsed juepuedepuy 


ABM-OML) 


¢ stshyeuy 


("O GNV S4NOUD 


Al ZIGVL 


June, 1956) AFTRETH 
Adj. SS Ze,m = (SS Ze + SS Zyy) - (BC)o,y 
= 5944. 1693 - 2673. 1937 = 3270. 9756 1 


Adj. SS Zyy = Adj. SS Ze, - Adj. SS Ze 
= 3270. 9756 - 3270. 0502 = . 9254 


Table IV indicates the unadjusted F ratio of 
<1 for method resulted in insignificance, as the 
obtained value for F ratio with degrees of free- 
dom n, - l and n, - 106 was <1. The unadjust- 
ed F ratios of 2. 781 and 1.877 for the teacher 
and the interaction factors respectively were 
also insignificant. The obtained value for the 
adjusted F ratio of <1 for method resulted in 
insignificance. With degrees of freedom n, - 1 
and n, - 103, the obtained value was less than 
the table value. The F ratio calculated for the 
adjusted sums of squares indicated thatthe null 
hypothesis, ¥ | should be accepted. de- 
fines the term parametric mean of the control 
method and the ¥ 4 defines the term parametric 
mean of the experimental method. Because this 
analysis was concerned with the variation due 
to teacher and interaction factors separated 
from the total variation, the significance found 
at the five percent level for teacher was not to 
be of concern to the experimenter. The source 
of variation attributed to interaction was not sig- 
nificant as indicated in Table IV. 

The null hypothesis, that there was no signif- 
icant difference between the split-control and 
the split-experimental sections as indicated by 
the immediate recall test in addition of frac - 
tions, was accepted. The teacher and inter- 
action of teacher and method factors were con- 
trolled in this analysis. 


FOOTNOTES 


1. Classes in which all pupils received the con- 
trol treatment. 

2. Classes in which all pupils received the ex- 
perimental treatment. 

3. The section of a class (randomly divided into 
two sections) in which the pupils received the 
control treatment. 

4. The section of a class (randomly divided into 
two sections) in which the pupils received the 
experimental treatment. 


5. Adjusted for unequal sub-class frequencies. 
6. Adjusted for covariance effects. 


10. 


11. 


12. 


. Brueckner, L. J. 


. Cochran, W. G. 


. Holsopple, J. Q. and Vanause, L. A. 


281 


BIBLIOGRAPHY 


. Anderson, R. L. and Bancroft, T. A. Sta- 


tistical Theory in Research (New York: Mc- 
Graw-Hill Book Co., 1952), 399 pp. 
‘*Reliability of Diagnosis 
of Error in Multiplication of Fractions, "’ 
of Educational Research, XXVI1(N>- 
vem 


» PP. 
. Burton, D. L. "A Comparison of Three 


Methods of i ¥ reciation of the 


— ory to rade Students, un- 
pu Minneapolis: Un- 


een of Minnesota Library, 1951. 

**Some Consequences When 
the Assumption for the Analysis of Variance 
Are Not Satisfied,’’ Biometrics, Il] (March 
1947), pp. 22-29. 


. Collier, Raymond O. The Method of Pivotal 


Condensation in the Calculation of Certain 
Statistical Quantities, mimeographed. Min- 
neapolis: Bureau of Educational Research, 

University of Minnesota, May 1952. 


. Fisher, R. A. and Yates, Frank. Statistical 


Tables for Biological, ricultural, and 
Medical (New York: Hafner Pub- 
shing Co., , 112 pp. 
‘*Note 


on the Data Hypothesis of Learning,’’ School 


and Society, XXIX (1929), pp. 15-16. 
. Hoyt, Cyril. ‘‘Test Reliability Estimated by 


Analysis of Variance,’’ Psychometrika, VI 
(June 1941), pp. 153-160. 


. Johnson, P. O. Statistical Methods in Re- 


search (New York: Prentice-Hall, Inc. , 
1949), 377 pp. 

Kruglak, H. Experimental Outcomes of Lab- 
oratory Instruction in Elementary C ol lege 
Physics, unpublished Ph. D. Minne - 
apolis: University of Minnesota Library, 
1951. 

Lowan, A. N. Table of Natural Loge 
Vols. Il and II (New York: National Bureau 
of Standards, 1941), 501 pp. 


Moonan, W. J. The Generalization of the 
Principles of Some Modern Experimental | 


signs for Educational and Psychological 
Research, unpublished Ph. D. Gaia Min- 


neapolis: University of Minnesota Library, 
1952. 

Price, R. D. An Experimental Evaluation 
of the Relative Effectiveness of the Use of 
Certain Multi-Sensory Aids in Instruction 
in the Division of Fractions, unpublished 
Ph.D. thesis. Minneapolis; University of 
Minnesota Library, 1950. 


A FOLLOW-UP OF AN EXPERIMENTAL 
TEACHING METHOD IN MUSIC AT THE 
FOURTH AND FIFTH GRADE LEVELS 


CARL B. NELSON 
State Teachers College 
Cortland, New York 


Introduction 


IN THE MARCH, 1955, issue of the Journal 
of Experimental Education!* the author report- 
ed the results of an experiment he conducted in 
1952-53 at the Concord School, Edina, Minne- 
sota. The purpose of that study was to deter- 
mine what effects might be observed when in - 
strumental training is introduced into the c on- 
ventional music curriculum. A comparison 
was made between two types of music programs: 
instrumental-vocal and vocal alone. 

In order that main effects of experimental 
treatment and grade level could be measured 
simultaneously the writer set up a 2 x 2 factor- 
ial design. Thus a control and experimental 
group were selected at the fourth grade and the 
same procedure followed at the fifth grade lev- 
el, making a total of four subclasses. The con- 
trol subgroups were taught primarily by the 
singing approach while the experimental class- 
es had equal amounts of instruction time in vo- 
cal and instrumental performance. 

Both the control and experimental subgroups 
had music class daily for the entire school year 
of 1952-53. Each class period was thirty min- 
utes in duration. In order to minimize the ef- 
fect of the instructor, the investigator taught 
all four classes. However, due to circum - 
stances beyond the control of the author, the 
treatment differences could be extended only 
over a period of twenty-five consecutive weeks, 
rather than the entire school year. 

The tests used to measure the gains over 
the period of the experiment were selected to 
measure criteria of knowledge of musical nota- 
tion, audio-visual discrimination, and music 
preference. The latter criterion was tested 
means of the Keston Music Preference Test 
and the former criteria by two tests the author 
constructed for this purpose. Test 1 measured 
knowledge of musical notation and Test 2 was 
designed to measure the discrimination pow- 
er of the subjects with respect to musical audio- 
visual stimuli. All the tests were administered 
before and after the experimental treatment. 

The technique of analysis of variance and co- 
variance served as the main tool in the anal - 
ysis of the data thus compiled. This device 


*All footnotes will be found at end of article. 


was useful not only because it yielded an exact 
test of the effects but also because it took into 
account the subjects’ inequalities in initial abili- 
ties and achievements. 


The Follow-Up 


In the spring of 1954, one full year after the 
experiment was concluded, the investigator 
again tested the subjects involved in the original 
experiment. It was known that, during this school 
year, all the experimental and control pupils had © 
been subjected to the same musical training com - 
monly followed in the Edina system, namely, a 
major emphasis on singing with noclass instruc- 
tion in instrumental performance. 

The purpose of the follow-up was two-fold: 
a) Would there be a retention of the initial gain 
found in favor of the experimental subc lasses 
after one year of no difference in treatment? b) 
Would there be, in addition, any significant di- 
vergence found among the subgroups during the 
same period of time (1953-54)? It would seem 
important to discover whether or not this change 
in instructional procedure was valuable with par- 
ticular regard to its long-term effects on the pu- 
pils. Hence, while the original experiment was 
primarily contrived to discover the effects of 
day-to-day learning, it follows that the longitud- 
inal development of desirable skills and attitudes 
should be tested and not assumed. 

In the initial study, it was discovered that 
the experimental groups, particularly at the fifth 
grade level (the present sixth grade), had gained 
significantly more in terms of the criteria than 
did the controls. Further, it was apparent that 
the same children enjoyed their music class per- 
iods more than did the pupils who were taught 
only by the medium of the voice. Thus it was 
the objective of the follow-up analysis to discov- 
er the direction of further development of the 
subjects involved in the original experiment one 
year later. 

The analysis of the data was complicated by 
the fact that some of the children had moved out 
of the Edina school system and were not able to 
be reached for testing. Twelve of the original 
104 subjects were in this category, reducing the 
total N to ninety-two. Thus the subclasses 


284 JOURNAL OF EXPERIMENTAL EDUCATION 


were of unequal size. In the initial experiment, 
each subgroup totaled twenty-six. Alsoit must 
be pointed out that the tables list the groups as 
of their 1954 status; while they were fourthand 
fifth graders during the experiment, they must 
be correctly identified now as fifth and sixth 
grade pupils. 

Another point of interest is that the children 
had been re-assigned to classrooms for the 
1953-54 school year. Not only were the ‘‘ex- 
perimental’’ and ‘‘control’’ subjects broken up 
into new classes, some also were assigned to 
different elementary schools within the district. 
Therefore, any trace of the original class or- 
ganization held during the experiment would 
have to be considered a chance factor. Final- 
ly, a canvass of individual instrumental! partici- 
pation outside of music class revealed no group 
deviation from that of the preceding year. 


Analysis of Data 


Because the subgroups were of unequal size 
at the time of re-testing, it was necessary to 
follow a model which wouid provide an exact 
test of the main effects yet take into account 
the inequality in sample size. In the experiment, 
the cell frequencies were equal (N = 26) ineach 
of the four subclasses. In the follow-up the fre- 
quencies were as follows: fifth grade control 
= 21; fifth grade experimental = 23; sixth grade 
control = 26; sixth grade experimental = 22. In 
a factorial design with unequal frequencies the 
sum of squares of any effect is not orthogonal 
to the sum of squares of other effects, hence 
the variations for one must be adjusted for the 
other. 

In a paper prepared by Buchman? the steps 
in a least squares analysis for a situation of 
this sort are outlined. This model was prefer- 
able to an approximation method for it made 
possible exact tests of the interaction and main 
effects. 

For each test of each test administration, a 
matrix of four simultaneous equations had to be 
solved, in this case using pivotal condensation. 
Since each of the three tests was administered 
on three different occasions (before the experi- 
ment began in the fall of 1952, in the spring of 
1953, and in the spring of 1954), the process 
had to be carried through nine times. 

The assumption of equal variance on all 
three of the tests for the four subclasses was 
met by application of the L, criterion*, The an- 
alysis of variance and covariance was used to 
test the assumption that the individual regres- 
sion coefficients were the same as the ‘‘within’’ 
regression coefficient on all measures. In all 
cases the null hypothesis of no differences was 
accepted. In order that the following discussion 
presents no confusion as to direction of the ratios, 


(Vol. 24 


Table I lists the means and standard deviations 
for each group at each test administration. 

To discover if the experimental groups re- 
tained and/or increased their gain over the con- 
trol classes during the year after the experiment 
(1953-54), the investigator made the following 
computations. The data from each test were an- 
alyzed holding the pretest (1952 testing) constant 
against the follow-up. A separate analysis was 
carried through holding the final test scores 
(spring of 1953) constant in comparison to the 
follow-up test data. The analysis of variance 
and covariance procedure therefore had to be 
completed six times in this instance, or twice for 
each of the three tests. In this manner, a judge - 
ment was possible, based on an exact test, with re- 
spect to the long-term values of the skills and 
preferences developed by the experimental treat- 
ment as well as the effect of grade level. A con- 
clusion could also be reached in answer to the 
question: woulda significant divergence appear 
among the subgroups during the following year 
when no treatment differences were applied? ‘ 

Knowledge of Musical Notation—In the first . 
analysis, the writer measured the gain between 
the experimental and control groups over the 
two-year period. This was done by the applica- 
tion of the analysis of variance and covariance 
to the data obtained from the 1952 and 1954 ad- 
ministrations of Test 1. 

Let aj, be the effect of treatment on achieve- 
ment and (jj the effect of grade level on achieve- 
ment. The hypothesis that the class means are 
equal with respect to the treatment effect is ex- 
pressed as a, =a@,=0. The hypothesis that 
the class means are equal with respect to the ef- 
fect of grade level is expressed £8, = 8, = 0. The 
hypothesis of no interaction may be written 
njj = 0. 

) Table II shows the test of the hypothesis 7,, = 
Ti2 = 2, = Tag = 0. Note that the value of F falls 
between the five and one percent levels of signif- 
icance. Thus the hypothesis must remain in 
doubt. Because of this finding, the variation due 
to interaction was not pooled with the residual er- 
ror for the test of the treatment effect. No test 
was made of the hypothesis 8, = 8, = 0, since in- 
teraction could not be ruled out and the grades 
were tested separately. 

Table II] summarizes the calculations for the 
analysis of variance and covariance to test the 
hypothesis a, = a, = 0 for the fifth grade sub- 
jects. The resulting F value is significant atthe 
one percent levei and the null hypothesis is re- 
jected (F = 10. 94). 

Table IV shows the analysis of variance and 
covariance table to test a, = a, = Oatthe sixth 
grade level. Again the F value is significantat 
the one percent level and the null hypothesis is 
rejected (F = 31.06). 

In the initial experiment, the author found a 


& Aq pa}eotpul st UO ,,100d,, « 


8b SOT 60 96 “SII apes3 
6L LS “101 $0 “FOI 1g "92 98 apel3 MA 


‘a's x x ‘a's x 


“LZ Tequewtsedxe aper3 
L6 91 02 apes3 
86 “OT apes? 
‘a's x 


Tequeutsedxe aper3 
eper3 
apes3 


T 


SONLLSAL GNV €S6I 
‘@S61 GHL YOd VINALINO TIV NO SHSSVIOGNS AHL SNOLLVIARG GUVGNVLS GNV 


1 


June, 1956) NELSON 285 
alssss | 
z 
12 
| 
band | 
| 
| 
n 
a 
Ye) 
588s 
Kisans 
NANN | 


: 


>de 


ad THoL 


paisn(py 


jo 


SONILSSL AGNV 2961 40 I OL LOAdS 


SNVUNW SSVTO NI ON 40 SSSSHLOGAH FHL OL GNV 4O SISA TYNV 


10° <d 


L68Z “19S ‘9 
66LL 691 6ZS6 


“COL ‘TT 68 TeoL 


xz &xz 


AZ 7? 


SONLLSEL $961 GNV 1 ISEL OL 


= = = NOLLOVUELNI ON 40 SISSHLOdAH OL GNV ZONVIIVA 40 SISA TVYNY 


0 


PSL 


“Zoe VELL LOZ 


S6Lb ISS ‘L 


POLL ‘T 1 
-gas 


2AZ 7? WORN 
JO 


SONILSAL 2661 JO I ISAL OL LOGdS 


HLIM BLXIS GHL LY (0 = = ' 0) LOdAAT OL 
SNVGW NI ON 40 SISGHLOGAH FHL OL GNV JO SISA TYNV 


Al 


June, 1956) NELSON 287 
3 
| 8 
| 
il 
| 


268 JOURNAL OF EXPERIMENTAL EDUCATION 


significant interaction effect; the analysis in the 
follow-up continues to show the same trend. 
However, over the two-year period of time, sig- 
nificant differences in mean achievement are 
found at both grade levels in favor of the exper- 
imental groups. Immediately after the exp eri- 
ment, a significant difference appeared in favor 
of the fifth grade experimental class only (the 
present sixth grade). Thus we conclude that 
there was a significant growth at both grades 
and the long-term effects are significant on the 
basis of the criterion of knowledge of musical 
notation. 

A similar test was made using the data of the 
follow-up and holding constant the 1953 test 
scores, The F values for the interaction effect 
and for the effects of treatment and grade level 
in this instance were not significant and the null 
hypotheses were accepted. Therefore, cover- 
ing only the year of no difference in instruction 
(1953-54), no significant differences appeared 
among the subclasses. 

Audio-Visual Musical Discrimination—Anal- 
ysis of the data from Test 2 over the two-year 
period showed a significant interaction effect 
(F = 8.46) at the one percent level. A separate 
test on the achievement of the fifth grade sub- 
jects showed no significance (F = 1.12), which 
is statistically the same finding as at the end of 
the initial experiment. However, when the ef- 
fect of treatment on achievement was tested with 
the sixth grade children, a significant difference 
in favor of the experimental group was found 
(F = 7,38). In the 1953 analysis, the null hy - 
pothesis with respect to this grade level on Test 
2 data remained in doubt. Thus it is clearly 
shown that the treatment effected a significant 
change in achievement over the two-year period. 
This conclusion was not possible after one year. 

As in the previous instance, however, hold- 
ing the 1953 scores constant and measuring the 
differences at the end of the 1954 period, no sig- 
nificant divergence appeared. In this analysis, 
the hypotheses of no interaction, of no difference 
in class means due to treatment, and no differ- 
ence in class means due to grade level were ac- 
cepted. 

Music Preference— The findings which re- 
sulted from an analysis of the data obtainedfrom 
the Keston Music Preference Test did not re- 
veal the same trend as in the previous tests. In 
1953, there was no significant interaction and 
the main effect of treatment showed a significant 
difference between the experimental and control 
subclasses ‘‘in favor’’ of the experimental sub- 
jects. The analysis of the main effect of grade 
level yielded a non-significant ratio. 

In 1954, the year of the follow-up, the analy- 
sis of variance and covariance applied to the 
1952 and 1954 data showed markedly different 
values. First, the test of the hypothesis of no 


(Vol. 24 


interaction remained indoubt(F = 6.17). Second, 
the test of the hypothesis a, = a, = Oatthe fifth 
grade level showed there was a non-significant 
difference between the experimental and control 
pupils (F = 1.11). Third, the test of the hypoth- 
esis a, = 4, = 0 at the sixth grade level showed 
that the hypothesis must remain in doubt (F= 
6.62, .01<P<.05). Since the test of no inter- 
action resulted ina value in the region of doubt, 
the variaiton due to interaction was not pooled 
with the efror term. Consequently, no test of 
the hypothesis 6, = 8, = 0 was made in viewof 
the fact that the subclasses were tested separate- 
ly at each grade level. 

Obviously, since the analysis over the 1952- 
54 period showed no significant findings from 
any of the effects, one wouldassume that the year 
1953-54 would produce no difference either. This 
assumption was tested, however, and proved 
correct. 


Summary and Conclusions 


The purpose of the 1954 follow-up was to study 
the effects of the 1952-53 experiment by testing 
the further development of children registered 
in the subclasses. The writer points out that, 
during the year 1953-54, there were no tr eat- 
ment differences among the subgroups. In sum- 
mary, the following important findings of this 
analysis are listed: 


1. The fifth grade ‘‘experimental’’ subjects (who 
were fourth graders during the year of the ex- 
periment) were significantly better than the 
controls with respect to the criterion of 
knowledge of musical notation. It is import- 
ant to remember that this difference was not 
noted immediately after the experiment in 
1953. 


. The sixth grade ‘‘experimental’’ pupils (who 
were fifth graders during the experiment) re- 
mained significantly better than the controls 
on the basis of the knowledge of musical nota- 
tion criterion. 


. The sixth grade ‘‘experimental’’ subjects 
were significantly more capable in audio-vis- 
ual musical discrimination in 1954 than the 
controls. This was not true in 1953 at the 
close of the experimental period, although the 
hypothesis was in doubt at that time. 


. The fifth grade ‘‘experimental’’ pupils showed 
no change with regard to the audio-visual cri- 
terion. In 1953, this group had not developed 
higher skills than the controls in the recogni- 
tion . musical audioand visual stimuli, either. 


. Neither the fifth or sixth grade ‘‘experiment- 


June, 1956) 


al’’ children preferred ‘‘better music than 
the controls in 1954. This statement cannot 
be made with certainty in the case of the sixth 
graders since the hypothesis remained in 
doubt, but the trend is downward for these 
pupils. Also, since the differences statisti- 
cally disappeared at the fifth grade level, 
strength can be given to the assumption that 
there were no real differences among all the 
subclasses. 


6. On all three criteria, no real differences 
emerged between experimental and control 
subgroups when only the year 1953-54 was 
considered. 


These data would appear to imply that abili- 
ties directly applicable to the development of 
skills related to the mastery of the musical 
score can be retained better than attitudes 
as measured by music preference. The find- 
ings indicate further that instrumental m usic 
instruction enriches the average child’s musi- 
cal background so that musical skills of a high- 
ex order can be attained more readily. It is 
doubtful that the advantages thus gained will be 
lost to the pupils, particularly as these skills 
continue to be reinforced during the ensuing 
school years. 

The fact that the younger experimental pu- 
pils did not vary significantly from the controls 
with respect to the higher skills of discriminat- 
ing audio and visual musical stimuli raises in - 
teresting points. Because the differences in 
the older children grew until they were signifi- 
cantly different in their favor at the endof 1954, 
one might conclude that, on the basis of this cri- 
terion, it is not economical to introduce in- 
strumental music instruction to the curriculum 
until the fifth grade. Perhaps a subsequent test- 
ing would yield data helpful in making a judg - 
ment relative to this point. Unquestionably the 
abilities measured by Test 2 are of a higher 
order, however, and it is possible that a cer - 
tain level of maturity must be reached before 
one can expect children to grow in musicality 
through instrumental training more so than 
through a straight vocal program. 

It is of interest to note here that Trabue® 
concluded from his research that music appre- 


NELSON | 289 


ciation, which he equates with music prefer- 
ence, is a highly specialized trait remarkably 
susceptible to training. Unfortunately, he did 
not report measures of the retention of this be- 
havior in children. He notes that most of his 
experiments were of short duration and the ef- 
fects immediately noticeable. Might they not 
vanish as readily? The data from the present 
analysis suggest that such might be the case. 
The implication here is, of course, thata train- 
ing period to induce this desirable attribute in 
school children should extend over a longer span 
of time than one year, if not an on-going plan of 
instruction. 

In general then, it would appear that the val- 
ues derived from an instrumental-vocal program 
as outlined in this study indicate the long-term 
effectiveness desirable in any system of instruc- 
tion. However, the data suggest it would be 
more advisable to introduce such a curriculum 
no earlier than the fifth grade. 


FOOTNOTES 


. Carl B. Nelson. ‘‘An Experimental Evalua- 
tion of Two Methods of Teaching Music in the 
Fourth and Fifth Grades, ’’ Journal of E ri- 
mental Education, XXIII (March 1955), pp. 
331-38. 

2. Morton J. Keston. An Experimental Evalua- 

tion of the Efficacy of Tuo Methods of Teach- 
ing Music Appreciation, unpublished Ph. D. 
Tessie” University of Minnesota, 1949. 

. Roland Buchman. The Least Squares Analy- 
sis of a 2-Way Factorial Experiment With 
Unequal Frequencies in the Cells, mim eo- 
sraphed (Minneapolis: Bureau of Education- 
al Research, University of Minnesota, May 
12, 1954), pp. 7. 

. Palmer O. Johnson. Statistical Methods in 
Research (New York: Prentice-Hall, Inc. , 
T1949), pp. 82-86. 


. M. R. Trabue. ‘‘Scales for Measuring Judg- 
ment of Orchestral Music,’’ Journal of Ed- 


ucational Psychology, XVI (December 1923), 
pp. 545-561. 


AN HONOR SYSTEM REDUCE CLASS- 
ROOM CHEATING? AN EXPERIMENTAL 
ANSWER 


RAY R. CANNING 
Brigham Young University 


OPPORTUNITIES for experimental research 
in the Behavior Sciences are severely limited. 
However, an occasional situation arises in which 
human beings can be manipulated either vol un- 
tarily or without their knowledge. The class- 
room offers such a research environment which, 
in this case, was used to determine cheating 
practices of university students. é' 


The Technique 


Previous studies of honesty among students 
have utilized a variety of techniques,!* some 
of which were used in this study to test validity 
and reliability. However, for the experiment 
reported here, one simple technique was repeat- 
ed five times over a period of six years: 


1. After regular examinations were collected 
from lower division sociology students, du- 
plicate copies of the students’ answers were 
carefully recorded for later comparisons. 


. These duplicate test papers were then cor- 
rected and graded, but no markings were 
made on the original examination papers. 


. At the next class session the unmarked orig- 
inals were returned to their owners with the 
implication that the instructorhad not yet 
had time to correct them. ‘‘Aid’’ from the 
students was solicited, and each was ‘‘per- 
mitted’’ to ‘‘correct’’ his own paper. 


. At the end of this experimental period, the 
papers were again collected and any changes 
made upon the examination papers by the stu- 
dents were also recorded on the duplicate 
sheets. 


. Tabulated differences, then, became the data 
of this cheating experiment. 


The Time Period 


In 1948, one year before an Honor System 
was established at the Brigham Young Univer - 
sity, the first experiment was made upon what 
will be referred to as Class A. This group of 


*All footnotes will be found at end of article. 


students will be considered the ‘‘Before’’ part 
of the total experiment. During the years of in- 
troduction and revision of the Honor Code and 
System (1949-1953), three other classes (Class - 
es B, C, and D) were studied. They constitute 
the ‘‘During’’ part of the experiment. Finally, 
five years after the inauguration of the System 
(1954), a follow-up study was made (Class E) 
which will be called the ‘‘Now’’ stage of the e x- 
periment. 


The Sample 


Five lower-division sociology classes were 
used in the experiment proper. Their 299 stu- 
dents were divided up as follows: 


In the ‘‘Before’’ group there were 48 stu- 
dents, 181 in the ‘‘During’’ group, and 70 
in the ‘‘Now’’ group. In addition to this 
research sample, three classes (X, Y, 
and Z) containing 71, 38, 96 students, re- 
spectively, were used in validity and re- 
liability exercises. Similar experiments 
were made with 109 of these students but 
by different instructors.2 Class Z, (96 
students) was experimented with by other 
techniques. 3 


Although a sample of 299 students is large 
enough for most statistical manipulations, it is 
well to note that in both the ‘‘Before’’ and ‘‘Now”’ 
groups the number is small. 


Standardization 


Attempts were made to keep tests, samples, 
classroom conditions, and methods of cond uc t- 
ing the experiment comparable throughout the 
‘‘Before’’, ‘‘During’’, and ‘‘Now’’ periods. One 
instructor performed all experiments; classes 
had considered similar subject matter; the ex- 
periment was performed at the same time 
in the quarter; and the approach to theclass 
was standardized. Throughout the six year 
period, no information was divulged which 
may have alerted the students to the nature 
of the experiment. 


DOES AN HONOR SYSTEM REDUCE CLASS.- 
ROOM CHEATING? AN EXPERIMENTAL 
ANSWER 


RAY R. CANNING 
Brigham Young University 


OPPORTUNITIES for experimental research 
in the Behavior Sciences are severely limited. 
However, an occasional situation arises in which 
human beings can be manipulated either vol un- 
tarily or without their knowledge. The class- 
room offers such a research environment which, 
in this case, was used to determine cheating 
practices of university students. 


The Technique 


Previous studies of honesty among students 
have utilized a variety of techniques,!* some 
of which were used in this study to test validity 
and reliability. However, for the experiment 
reported here, one simple technique was repeat- 
ed five times over a period of six years: 


1. After regular examinations were collected 
from lower division sociology students, du- 
plicate copies of the students’ answers were 
carefully recorded for later comparisons. 


. These duplicate test papers were then cor- 
rected and graded, but no markings were 
made on the original examination papers. 


. At the next class session the unmarked orig- 
inals were returned to their owners with the 
implication that the instructorhad not yet 
had time to correct them. ‘‘Aid’’ from the 
students was solicited, and each was ‘‘per- 
mitted’’ to ‘‘correct’’ his own paper. 


. At the end of this experimental period, the 
papers were again collected and any changes 
made upon the examination papers by the stu- 
dents were also recorded on the duplicate 
sheets. 


. Tabulated differences, then, became the data 
of this cheating experiment. 


The Time Period 


In 1948, one year before an Honor System 
was established at the Brigham Young Univer- 
sity, the first experiment was made upon what 
will be referred to as Class A. This group of 


#All footnotes will be found at end of article. 


students will be considered the ‘‘Before’’ part 
of the total experiment. During the years of in- 
troduction and revision of the Honor Code and 
System (1949-1953), three other classes (Class- 
es B, C, and D) were studied. They constitute 
the ‘‘During’’ part of the experiment. Finally, 
five years after the inauguration of the System 
*(1954), a follow-up study was made (Class E) 
which will be called the ‘‘Now’’ stage of the e x- 
periment. 


The Sample 


Five lower-division sociology classes were 
used in the experiment proper. Their 299 stu- 
dents were divided up as follows: 


In the ‘‘Before’’ group there were 48 stu- 
dents, 181 in the ‘‘During’’ group, and 70 
in the ‘‘Now’’ group. In addition to this 
research sample, three classes (X, Y, 
and Z) containing 71, 38, 96 students, re- 
spectively, were used in validity and re- 
liability exercises. Similar experiments 
were made with 109 of these students but 
by different instructors.2 Class Z, (96 
students) was experimented with by other 
techniques. 3 


Although a sample of 299 students is large 
enough for most statistical manipulations, it is 
well to note that in both the ‘‘Before’’ and ‘‘Now’’ 
groups the number is small. 


Standardization 


Attempts were made to keep tests, samples, 
classroom conditions, and methods of cond uc t- 
ing the experiment comparable throughout the 
*‘Before’’, ‘‘During’’, and ‘‘Now’’ periods. One 
instructor performed all experiments; classes 
had considered similar subject matter; the ex- 
periment was performed at the same time 
in the quarter; and the approach to theclass 
was standardized. Throughout the six year 
period, no information was divulged which 
may have alerted the students to the nature 
of the experiment. 


292 JOURNAL OF EXPERIMENTAL EDUCATION 


Findings 
A. Changes in Incidence of Cheating 


Of all students studied during the six years 
of experimentation, 45 percent cheated in the 
controlled examinations. The total percentage 
is not so important, however, as the change in 
percentage of students cheating before, during 
and after the instigation of the Honor System. 
The ‘‘Before’’ period had a high 81 percent of 
the students who cheated. This was reduced 
in the ‘‘During’’ period to 41 percent, and final- 
ly in the ‘‘Now’’ period to 30 percent. See Table 
I. The before-Honor System high was cut in 
half by the end of the first three years of the 
Honor System. In five years, it was reduced 
by nearly two-thirds (63 percent). Tests were 
made of these percentage reductions to dete r- 
mine if the differences were statistically signif- 
icant.4 There is less than one possibility in 
1000 that such a difference could have resulted 
from chance factors. Of course, this does not 
prove a causal relationship between the Honor 
System and a reduction in cheating, but the like- 
lihood should be noted. 


B. Male-Female Comparisons 


It may be noted also from Table I that in 
every period more female students cheated than 
did the males. The comparative total was 78 to 
57, respectively. Of the total cheating group, 
58 percent were women and 42 percent were 
men. This statistic by itself, however, is 
misleading. The proportion of cheaters of ei- 
ther sex must be compared to the proportion 
within the total sample of members of that sex. 
Thus the 58 percent and 42 percent above should 
be compared respectively to 56 percent of the 
total group who are females and 44 percent who 
are males. Table Il shows these relationships. 

Although among the students who cheated in 
the ‘‘Before’’ group, 59 percent were women, 
of the total ‘‘Before’’ group they comprised 62 
percent. The men in this group cheated out of 
proportion to their number in the total group. 
In the ‘‘During’’ period the proportions by sex 
in both the total and cheating groups were exact- 
ly the same. However, in the ‘‘Now’’ period of 
the experiment, the women cheated dispropor- 
tionately, i.e., they comprised 62 percent of 
the cheating sample but only 53 percent of the 
total sample.5 The rise and decline by sex 
should also be noted. The percentage among 
the cheaters who were male rose from 41 per- 
cent to 44 percent and then dropped to 38 per- 
cent, while among the cheaters the female per- 
centage rose from 59 percent (‘‘Before’’) to 62 
percent (‘‘Now’’). 

Similarly, there were changes in the aver- 


(Vol. 24 


age number of points® cheated. In the ‘‘Before’’ 
period, the cheating males averaged 11.6 points 
with a range of 4 to 24 points, while the cheating 
females averaged 12.9 points and a range of 4 to 
30 points. By the ‘‘Now’’ period the males had 
reduced their average to 9.1 (R = 3-17) but still 
had a higher average than the females who had 
dropped to an average of 7.6 points cheated per 
cheating-student (R = 3-17). Again, however, 
standard errors of these differences indicate that 
they are probably products of chance. 

Significance is apparent, though, when the 
average points cheated of the total ‘‘Before’’ 
group (12.3 points) are compared to the average 
of the ‘‘Now’’ group (8.2 points). ‘ Not only are 
there fewer cheating, but their cheating is of less- 
er magnitude. 


C. Methods of Cheating 


Four methods of cheating were discernable 
in the experiment: (1) wrong answers were 
erased or crossed through and correct answers 
inserted, (2) previously-ieft blanks were filled 
with correct answers, (3) answers which were in- 
correct were not marked as such, and (4) arith- 
metical ‘‘mistakes’’ favorable to the students 
were made. 

There were 427 individual cases of cheating 
divided among these four types as follows: 


135 cases of ‘‘filled blanks’’ (32%) 

125 cases of ‘‘changed answers’’ (29%) 
95 cases of ‘‘ ‘arithmetical’ mistakes’’ (22%) 
72 cases of wrong answers not checked (17%) 


Although throughout the entire experiment 
‘*filling in blank answers’’ was the most popular 
form of cheating, this was not so in either the 
Before or During period. Before the Honor Sys- 
tem, over half (53 percent) of the cheating cases 
were by ‘‘Changing-of-Correct-For-Incor rec t- 
Answers,’’ and during the first three years of 
the system, ‘‘Poor Arithmetic’’ was the most 
frequently used device (31 percent of the cases). 

Sex differences are reflected in the most 
popular forms of cheating throughout the differ- 
ent periods studied. In the ‘‘Before” period 
‘‘Changing Answers’’ was the most popular form 
of cheating of both sexes (M = 62 percent, F =41 
percent). In the ‘‘Now’’ period ‘‘Filling Blanks’’ 
(79 percent = M, 61 percent = F) was the most 
popular cheating technique used by students of 
both sexes. 

The sexes ‘‘Changed Answers’’ and ‘‘Filled 
Blanks’’ proportionately, but the male students 
outdistanced the female students in ‘‘Not Check- 
ing Wrong Answers, ’’ while the women surpassed 
the men in ‘‘Making Favorable Arithmetic Mis- 
takes.’’ 


dnoiy jo dnoid JO 


Ad ‘dNOUD GNV dNOUD TVLOL 40 


ajdureg payeayd paeayD payeayd payeayd 
oum % ‘ON ul ‘ON oum % ouM “ON UT ‘ON oum % “ON 


THIOL 


X@S OL ONIGHOOOV OHM SLNFGNALS LNADUAd GNV 
I 


June, 1956) 


CANNING 298 | 
eee | 
gja sale 
58 | 8 
Fee 
gj=- ° | 8 
° | 
gj2 
ss 
‘ele 


294 JOURNAL OF EXPERIMENTAL EDUCATION 


D. Cheating Related to Use of Pen or Pencil 


Approximately 11 percent more people cheat- 
ed among the students using pencils than among 
those using pen and ink (pens, 36 percent; pen- 
cils, 47 percent). Cheating students preferred 
pencils over pens by a ratio of 6 to 1; the non- 
cheating students, 4to1. A three-period com- 
parison will show a decided change in this pref- 
erence: Cheaters in the ‘‘Before’’ period when 
the rate of cheating was highest had a pencil-to- 
pen ratio of 19 to 1, as compared to 6 to 1 inthe 
**During’’ period, and 3 to 1 in the ‘‘Now’’ per- 
iod, typified by a relatively low cheating rate. 


E. Cheating as Related to High Scores 
on the MMPI 


All students in the research population who 
had any score on the Minnesota Multiphasic Per- 
sonality Inventory of 70 or above or 60 and above 
on the Lie score were classified according to 
their cheating-non-cheating behavior in the e x- 
periment. Fifty-seven students had T scores 
of 70 or up on one or more of the following scales: 
Hypochondriasis, Depression, Hysteria, Psy - 
chopathic Deviate, Interest, Paranoia, Psychas- 
thenia, Schizophrenia, or Hypomania. Thi rty- 
two students had Lie scores of 60 or above. Of 
the 57, 28 cheated and 29 did not. Of the 32, 15 
cheated and 17 did not. Thus, cheating does not 
seem to be differentiated in terms of high MMPI 
students. 


F. Cheating and Academic Proficiency 


The five highest and five lowest test scores 
were averaged for the cheaters and non-cheat- 
ers in each of the three periods of the exper i- 
ment. A marked difference was noted in each 
case. The mean score of these fifteen highest 
cheaters was 70 compared to 88 for the fifteen 
highest non-cheaters. The fifteen lowest scores 
averaged 47 for the cheaters, but 54 for the non- 
cheaters. Furthermore, the fifteen highest 
cheaters raised themselves an average of 15 
points. It is clearly evident that points cheated 
are inversely related to test scores. 

High school grade-point averages were com- 
puted for the cheaters and the non-cheaters. 
The cheating students averaged 1.98 grade points 
before coming to the University while the non- 
cheating students averaged 2.07 grade points. 


Verbalizations vs. Overt Behavior in the 
Cheating Experiment 


Prior to the experiment, a situational ques- 
tionnaire was administered to the students in 
order to find the relationship between verbal- 
izations and cheating behavior, i.e., promised 


(Vol. 24 


and actual behavior. Among other questions 
posed for the students’ consideration was: ‘‘You 
have an opportunity to change your score on an 
examination; (you find the instructor’s roll book, 
or discover some other technique) WHAT WOULD 
YOU DO?’’ Twenty-six students did not answer. 
But of the 272 who did answer, 231 (85 percent) 
pledged that they would not cheat, while 41 (15per- 
cent) said they would raise their grades. By com- 
parison, in the experiment itself, 150 (55percent) 
did not cheat and 122 (45 percent) didcheat. Thus 
a total of 89 people (31 men and 58 women) both 
cheated and lied—one-third of the total group who 
answered the questionnaire. 

Table III will show further that 33 students 
(12 percent) cheated as promised but did not lie 
about it; 142 students (52 percent) neither cheat- 
ed nor lied; and 8 students (3 percent) did not 
cheat in the experiment although they had pre v- 
iously stated they would cheat. 


Summary and Limitations 


After five years of testing under an Honor 
System at Brigham Young University, rates of 
cheating (of four specific types) were reduced by 
63 percent in lower division sociology c lasses. 
Similarly, the average magnitude of cheating was 
less: Before the Honor System the average was 
12. 3 points per cheating student per test. After 
five years of the Honor System this average was 
reduced to 8.2 points. 

Before the Honor System, male students 
cheated slightly out of proportion to their number 
in the total group, but after five years of the Sys- 
tem, this proportion was reduced and the women 
students cheated disproportionately. 

Although in general, the favorite method of 
cheating was by writing in correct answers for 
questions which during the examination had been 
left blank, prior to an Honor System the favorite 
device was through changing answers. Male stu- 
dents failed to check wrong answers more fre- 
quently than did women students who, in turn, 
were more adept at making favorable arithmetic 
‘‘errors.”’ 

Pencil-users cheated more frequently than 
students who used pen and ink. And the decline 
in cheaters’ preference for pencils is directly re- 
lated to the reduction in cheating itself throug h- 
out the experiment. 

No differentiations were noted between cheat- 
ers and non-cheaters who scored high on any of 
the scales of the Minnesota Multiphasic Personal- 
ity Inventory. However, in terms of high school 
grade-point averages, the non-cheaters sur - 
passed the cheaters with 2.07 to 1.98 grade points. 
They were also differentiated by average test 
scores, the cheaters consistently falling below 
the non-cheaters. Furthermore, the number of 
points students raised their scores was inverse- 


TABLE Ill 


STUDENTS WHO ANSWERED SITUATIONAL QUESTIONNAIRE ‘‘YES’’ OR ‘‘NO”’ 
ACCORDING TO WHETHER OR NOT THEY CHEATED IN THE EXPERIMENT 


No. Who No. Who 
Answered ‘‘Yes’’ Answered ‘‘No’’ 


Cheated: 
Males 31 


Females _58 
Both Sexes 89 


Did Not Cheat: 
Males 


Females 


Both Sexes 


Total Group 


June, 1956) CANNING 295 
122 
— 4 86 
| 8 142 150 
| 41 231 272 


296 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 


ly related to their correct test scores, i.e., 
‘‘poorer’’ students ‘‘raised’* themselves more 
points than did the ‘‘better’’ students. 

Verbalizing about honesty was relatively 
easy. Of the 272 students who answered a situ- 
ational questionnaire designed to test the rela- 
tionship between promised behavior and actual 
behavior, 33 percent cheated after p ro mising 
that they would not; 12 percent cheated as they 
promised they would; 52 percent did not lie and 
did not cheat, and 3 percent promised to cheat 
but failed to do so. 

The findings of this study must be interpre- 
ted only in view of its many limitations: the 
small number of people involved in some class- 
es, the lack of further controls, the ‘‘tempta- 
tion’’ conditions for certain types of c heating, 
and other unknown variables. It should not be 
considered as representative of larger groups 
or other conditions than those specifically de - 
scribed above. 


FOOTNOTES 
1. For one example see; Harold T. Christensen, 


Experiment in Honesty,’’ Social 
Forces (March 1948). 


2. InClass X (a lower division religion class of 
75 students) and Class Y (a lower division 
sociology class of 38 students) 45 students 
or 41 percent cheated in the experiment. 


3. They were made to witness cheating in the 
classroom in order that their reactions 
might be recorded. This experiment will 
be reported in another paper. 


4. Chi square = 17.356, Df = 2.P < .001. 


5. These differences are not statistically signif- 
icant. Chi-square test of the percent of 
women among the cheating population as 
compared to the percent among the total 
population, ‘‘Before’’, ‘‘During’’, and 
‘‘Now’’ indicates that a greater difference 
could occur 40-50 times in 100 among other 
samples due to chance. 


6. ‘‘Points’’ hereafter will mean ‘‘points- pe r- 
100-possible’’. This designation is used 
in lieu of percent, inasmuch as the word 
‘‘percent’’ is repeated so frequently in this 


paper. 


7. Significant beyond the .01 level of significance. 


THE COMPARABILITY OF THE SIMPLE DIS- 
CRIMINANT FUNCTION AND MULTIPLE 
REGRESSION TECHNIQUES 


WILLIAM B. MICHAEL 
University of Southern California 
NORMAN C. PERRY 
Alabama Polytechnic Institute 


FIRST SUGGESTED by R. A. Fisher (1) and 
introduced to psychologists by Travers (4) the 
simple linear discriminat function has also been 
considered at length by Garrett (2) and briefly 
by Wherry (5). Although Garrett has shown 
numerically the identity, or proportionality, of 
the weights obtained by the discriminant func - 
tion to those realized by a multiple regression 
approach in which each of the independent vari- 
ables is related to the dichotomous criterion 
of classification through use of the point bi- 
serial coefficient, an analytic solution of the 
equivalence of the two techniques was not forth- 
coming. Through use of either the point biser- 
ial coefficient or biserial coefficient of correla- 
ation relative to the existence of either a genu- 
ine dichotomy or an artificial dichotomy in the 
criterion variable Wherry has shown that the 
application of familiar multiple regression tech- 
niques will lead to the determination of two sets 
of proportional weights. Although suggesting in 
the summary of his article that the weights ob- 
tained from application of multiple regression 
procedures are identical (or proportional) to 
those found by the discriminant function ap- 
proach irrespective of the assumption regard- 
ing the nature of the criterion variable, Wherry 
has not shown clearly the comparability of his 
multiple-regression approach to that of the dis- 
criminant function involving use of analysis of 
variance. 

Despite the fact that Wherry’s normal equa- 
tions resemble closely those associated with 
the determination of weights in the discriminant 
function procedure as described by Fisher (1), 
Travers (4), Garrett (2), and Hoel (3), the sums 
of squares of scores on a given independent var- 
iable (e.g., a test) and the product moments of 
scores on the two given independent variables 
(e.g., two tests) are calculated with respect to 
two different sets of means. In the instance of 
the multiple-regression approach the two groups 
composing the criterion variable are combined, 
and calculations are effected with respect to the 
single composite mean for each independent var - 
iable; however, when the discriminant function 
approach is followed two means (one for each of 
the two criterion groups) are employed with re- 
spect to the determination of sums of squares 


and product moments. 


Probem 


It is the purpose of the writers to show both 
analytically and numerically the identity or pro- 
portionality of the weights assigned to the inde- 
pendent variables through use of either the dis- 
criminant function or the multiple-regress ion 
technique when a twofold classification is em- 
ployed in the dependent variable. It is assumed 
that the number of measures for each independ- 
ent variable is the same. No attempt is made to 
relate the procedures described to a generalized 
discriminant function in which more than two 
types or levels of classification may exist in the 
dependent variable. 


Background 


In the use of the simple linear discriminant 
function a linear combination of scores in the in- 
dependent variables is found such that the ratio 
of the squares of the differences between the 
means of the two criterion groups (i. e., the two 
groups for which the measures constitute the de- 
pendent variable) to the sum of squares of the in- 
dividual scores within each group about the mean 
of each group is a maximum. In a set of v inde- 
pendent variables (e.g., experimentally inde - 
pendent tests), X,,..., Xp, Xy, the 
score Zg; of individual i in one of the two (criter- 
ion) groups of classification (g = 1, 2) is a linear 
combination of his scores in each of the inde - 
pendent variables: 


Zgi= MiXglit--- + ApXgpi + AgXgqi --- 
+ A vXgvir (1) 


in which j represents the score of individual 
i of criterion group g on the independent variable 
(test) Pp and in which aA Ap» Aq: 
represent the weights to be determined that will 
maximize the ratio cited. If z, and z, designate 
the means of the scores furnished by equation (1) 
relative to the presence of n individuals in group 
1 and in group 2 respectively, and if Zs denotes 
the mean of group g (g = 1, 2), then the function 


298 JOURNAL OF EXPERIMENTAL EDUCATION 


defining the ratio to be maximized is given by 


of 
F« ; @, ~ 24) (2) 


From equation (1) it is apparent that the dif- 
ference term in the numerator of (2) may be 
written as 


Z, - Z, = - Ka) + ~ Xap) 


+rAyRiy Ray), (3) 
in which Xip and X,) Stand for the mean scores 


of group 1 and group 2 on the independent vari- 
able p. This equation may be written simply as 


+A pdp +... +Aydy, (3) 


in which dp = Xip Xap- 
Clearly the aumeneiee of (2) may be ex- 
pressed as 


Z, Asds @ 


(2, = 22)" = (Aud, +... +Aydy)” 


+ Apdy ope 


The difference term in the denominator may be 
written as 


21 ~ By ~ + p(Xgpi ~ Xgp) 


+ AylXgyi - Rqv), (5) 


where is the mean of the scores of the indi- 
viduals in one of the two groups g upon the inde- 
pendent variable p. If equation (5) is squared 
and summed first with respect to the number 
of individuals in each of the two groups g and 
if in turn the sum of the two group sums is found, 
the resulting expression may be written as 


2 


2 


(Xgpi Xgp)(Xgpi - 


2 
If Syq is said to define - 


(Xgqi - X gq)» then (6) may be written as 


2 


(Vol. 24 


It should perhaps be noted that the sums of 
squares of individual scores on an independent 
variable p about the means of group | and group 
2 in turn would be given by 


2 
= 2 
Spp = epi *ep) (8) 


Likewise, the sum of the product moments of the 
individual scores on the two independent var i - 
ables p and q relative to the two sets of means 
would be furnished by 


Expressions (8) and (8) , of course, constitute 
the numerators that enter into familiar variance 
and covariance terms. 

Substitution of (4) and (7) into the numerator 
and denominator of (2) yields: 


q=l p=1 

XprAqSpq 
q=l P 


In order that those values of the ,’s in(9) re- 
quired to maximize the function may be determ - 
ined, it is necessary to differentiate the func - 
tion with respect toA,, ..., Xp, Ag, Ay in 
turn and to set the partial derivatives equal to 
zero. Subsequent simplification leads to the 
following set of linear equations: 


+ApSip +AgSip = ed, 


-+XpSqp + AqSqq - - 
where c = +... +%ydy , aquantity which 


F 


is independent of the parameter p. 
The roots of equation (10) are given by 


ry = , 


where Ag is the determinant of Spq coefficients 


di Spy 
Sq Hoe 


Sys 


June, 1956) MICHAEL - PERRY 


in (10) and where A g, is the same as 4 gexcept 


for the substitution of the column of cdp entries 
(i.e., Cd,,..., ..., for the col- 
umn of Sgp entries (q = ,v) that represent 
the coefficients of the Ap *s appearing on the 
left-hand side of the equation. 

Sincé 4g is present in the denominator of 
each of the expressions for the roots, it is 
apparent that the weights are proportional to the 
entries in the numerators. Hence 


In the determination of the weights from the 


normal equations that are basic to the multiple- — 


regression approach, it is important again to 
emphasize that the sums of squares are taken 
about the > of the scores of group | and 
group 2 combined on a given independent vari- 
able. In other words, Xp = (nX,p + nX 2p) /N 
where N = 2n. Likewise, the product moments 
for the two independent variables p and q are 
calculated with respect to the means Xp and Xq. 
The sum of squares Tpp of individual scores 
about the mean Xp e sum of the product 
moments T about the means Xp and Xq are de- 
fined as fo 


(12) 


2 aA 
- - - . 3 


The normal equations involving T which ex- 
cept for the notation employed are the same as 
those in the set (14a) of Wherry’s development 
(5), may be written as follows: 


W, Ty +- i p+WaTiqt- +WyT,y=Nd, 


Ww, +WyTpy s Nd, 
(14) 
Ww, Tq +WyTay s Ndg 


W, Nay 


The roots of the equations are given by 


47 Ar 


p= » War 


where the entries in the determinants given by 
41) and 4 correspond closely to those indg, 


and Ag with the exceptions that T is to be substi- 
tuted for Sand N for c. (It may be informative 
to mention that the dp values in the right-hand 
members in the set of linear equations appearing 
in (14) correspond to the numerators of the form- 
ulas for biserial and point biserial coefficients 
and that the constants in these formulas are in- 
corporated within the W values. ) 

SinceA 7 is present in the denominator of 
each of the expressions of the W roots, the weights 
assigned to the independent variables in the mul- 
tiple-regression equation are proportional to the 
numerator entries. Therefore 


W,: Wa:...: Wp: Wq:... : Wy 


(15) 


Demonstration of the Proportionality of the. >” 
and Ww Weights 


If it can be shown thatAy, = kAg , ae, 
= kAs,, where k is a constant, then it may be 


concluded that the A values are poresen to the 
W values—in other words, thatA , :A,:...: 
Ap: y= Wi: W,:... Wp: Wa: 

: Wy. Since cash of the entries ind, can 


be expressed as a function of those appearing in 
Mg , a means is furnished for demonstrating the 


proportionality of Sp and determinants. 


The relationship between Top and and be- 
tween Tpq and Spq entries ind Tp andAs,, re- 


spectively, may be described as follows: 
Tpp s Spp + nd,p* + ndap* (15) 
and 


Typ s Sq + di pdig + nds pag (16) 


where dip = Xip- Xp and d, s Xap - Xp, the dif- 
ference between the mean of each criterion sub- 
group in turn and the composite mean of the tot- 


al group with respect to an independent variable 


p- 
Since the number of each sub- 

group is the same, d,p = - dap = ~ deg. 

Hence equations (is) lified 


Top = Spp + 2nd,p* (17) 


299 
ATy 
(11) 
dy 
ps Og, 
: 
| | 
|| 
%)" 
= 


300 JOURNAL OF EXPERIMENTAL EDUCATION 


and 


Tpq s Spq + 2nd, pdig . (18) 


Furthermore, since |dp| = = 2|diq it 
is apparent that values of 2d, can be substitut- 
ed for each of the dy quantities appearing in the - 
right-hand members of equation (10) and of equa- 
tion (14). 

Actually if c in (10) is taken to be unity (as 
may be done for convenience in the solution of 
the 7's) it can be shown = The 


existence of proportionality between the two de- 
terminants may be demonstrated through use of 
the relationship presented in (17) and (18) and 
from the fact that Ndp=N(2d,)). (See 
top of next page. ) 

This determinant may be readily evaluated 
through the formation of v - 1 sets of two determ- 
inants that are effected from the separation of 
the two terms that appear in each of the v - 1 
columns of (19). With the exception of one col- 
umn the entries in the two determinants of each 
new set will be the same. In the first determin- 
ant of the first set one column will be made up 
of entries and in the second determinant of 
the set a corresponding column of terms of the 
— d,, type will be present. It will be found 

e Value of the second determinant of the 
oat is zero, since there will be subsequent to 
factoring out of constants two columns that are 
alike. The first determinant of the first set may 
then be separated into two new determinants 
(which constitute a second set) ie the place- 
ment of entries of the Syq and 2nd, pd, q type of 
another column into the ante a ing columns 
of the first and second determinants that serve 
to constitute the second set. This process may 
be continued until it is found that AT, ands, 


are identical except for the presence of a con- 
stant factor N in the column containing the 2d,) 
entries. 

It may be helpful to illustrate the process in 
the instance in which v = 4 and in which the de- 
terminants& Ts; anda s, *re desired prior to 


the determination of the weights W, andA,. An 
abbreviated notation for the elements of the de- 
terminants will be adopted in terms of the cita- 
tion of the entries that would appear in the third 
row (p = 3). Initially, it is seen that 
Ar, + 2nd, + 2nd, N(2d, 5) 
Sy, + 2nd, 
185, Tse N(2d, 5) Tx | + 2nd, , | Ts2 


N(2d,5) Toa] - 


(Vol. 24 


Since the presence of two proportional columns 
in the second determinant of the fi: st set permits 
the second determinant to vanish, itis necessary 
to form a second set of determinants for the first 
determinant of the first set. Thus 


Ts2 N(2d,5) 18s, N(2d, 5) Tsal 
+ 2nd, dys N(2d,5) Tas! 


Again the second determinant of the set vanishes. 
The third and final set of two determinants is de- 
rived from the first determinant of the second 
set: 


+ 2nd ,4| 83, N(2d,5) d, al 


Again the second determinant is equal to zero. 
Consequently, it is seen that 


= N(2d,5) Ta] =N Ss2 


(2d, 5) Ss4| 2 NAg, 


The procedure described in the instance that 
v = 4 can be readily extended to any number of in- 
dependent variables. Therefore, since 47, = 


NOs, it can be concluded that the A 


weights associated with the simple discriminant 
function and the multiple-regression techniques 
are proportional. 


Illustrative Example 


Given the following fictitious data represent- 
ing the scores of two groups (g = 1,2) of four 
candidates (n = 4) on three tests (p = 1, 2, 3), find 
a system of weights ( A values) that will effect 
the maximum degree of separation between the 
mean composite scores of the two groups: 


Group 1 Group 2 
Test1 Test2 Test3 Test1 Test2 Test3 
2 11 16 
1 12 12 
4 15 20 
14 24 


In addition, if the two groups are combined with 
respect to the scores received on eachtest, find 
the regression weights (W values) that will yield 


| 


June, 1956) MICHAEL - PERRY 


8,, + 2nd, ,” eee N(2d, 1) Sigt 2nd, idig eee Siy + 2nd, idyy 


Sp. + 2ndipd,, N(2dyp) Spq + 2ndipdiq Spv + 2ndypdiy 


Sqi + N(2duq) Sqq 2ndig? Sqy gdiy 


Sy. + 2nd,yd,, ... N(2d,y) Syq + 2ndyydiq ... Syy + 2ndyy* 


the maximum degree of correlation between the 
scores on a weighted combination of the test = 16(1856) _ . 5836937 
variables and the dichotomous criterion (mem- ~ 50876" 
bership in group 1 or group 2). 

Employment of equation (10) with c taken as 
unity will yield the values, and use of the norm - 
al equations in (14) will furnish the W coeffi- - ere 
cients. When the appropriate sums of squares 
and product moments are found, the two equa- 
tions turn out to be 


486A, + 13A,+ 79A, = 2 
13A, + 200, + 11A, = 2 
79A, + LIA, + 178A, = 4 


. 2025316" . 


(10y From the appearance of the numerical entries 
of -880, 1856, and 644 in the numerators of the 
expressions for the .’s and the W’s, it is ob- 
and vious that the two sets of weights are proportion- 
al. It is found that the ratios A, : A, :A, and 
56W, + 21W, + 95W, = 8(2) W, : W,: Wy are equal to -1. 000000 ; 2.109091 


21W, + 28W, + 27W, = 8(2) : . 7318182. 
95W, + 27W, + 210W, = 8(4) 


for which the roots are 
REFERENCES 


1. Fisher, R. A. ‘*The Use of Multiple Meas- 
urements in Taxonomic Problems,'’ Annals 
of Eugenics, VII (1937), pp. 179-188. 

2. Garrett, H. E. ‘‘The Discriminant Function 

-1132952 , and its Use in Psychology,’’ Psychometrika, 
VIII (1943), pp. 65-79. 

3. Hoel, P.-G. Introduction to Mathematical 
Statistics (New York: John Wiley and Sons, 


.0393114 , 4. Travers, R.M.W. ‘‘The Use of a Discrimin- 
ant Function in the Treatment of Psycholog- 
ical Group Differences,’’ Psychometrika, 
IV (1939), pp. 25-32. 

5. Wherry, R. J. ‘‘Multiple Bi-Serial and Mul- 
tiple Point Bi-Serial Correlation, '’ Psycho- 

metrika, XII (1947), pp. 189-95. 


301 
(19) 
A, = = -.053715 

4s 2(1856) 
a 1947). 

Ar 16(-880 

* 


THE INFLUENCE OF ITEM MODALITY ON 
THE DIMENSION MEASURED BY A TEST’ 


DONALD M. MEDLEY 
Indiana University** 


Introduction 


WHEN A battery of proficiency tests for 
bombing-navigation systems mechanics was val- 
idated, one of the tests, designed to measure 
ability to read dials and indicators, failed to dis- 
criminate significantly among mechanics of dif- 
ferent levels of proficiency. A task-oriented 
analysis and the unanimous opinion of experts in 
systems maintenance both indicated that the abil- 
ity to read dials and indicators is an important 
part of routine maint. .ance skill. In seeking to 
account for the apparent failure of the test to 
measure this ability, it was suggested that the 
form of the items might be the cause. 


The 34 items on the test are all in five alter-. 


native multiple choice form. Each item pre- 
sents a drawing of one of the dials in the system 
with the hand pointing to a typical value. Four 
of the responses consist of more or less plaus- 
ible readings, and the fifth a ‘‘none of the above 
is correct’’ type of statement. The mechanic 
being tested is supposed to read the dia! and in- 
dicate which response agrees best with his read- 


Actually, what may be happening is that the 
mechanic is making comparisons between the 
responses and the picture of the dial, and elim- 
inating some or all of the responses as obvious- 
ly wrong. The task offered by the test probably 
does not resemble the task he must perform on 
the job, when, rather than choose among four 
readings, he must get the reading from the dial 
without aid of any sort. 

This possibility seemed strong enough to 
warrant investigation, so a second form of the 
test was constructed, consisting of the same 34 
items in completion form—that is, with blanks 
in which the mechanic was to write ihe value on 
the dial as he read it. His reading was consid- 
ered correct if it fell within accepted tolerance 
limits (for that dial) of the actual reading. 

A cross-validation study of the battery was 
contemplated in the future. In the meantime, a 
decision was made to administer the two forms 
to graduates of a training course for mechanics 


and to analyze their scores to see whether or not 
the completion form measures the same thing 
that.the multiple-choice form measures. If it 
does measure the same thing, it is nobetter than 
the multiple-choice test and there will be no need 
to include either form in the battery to be cross- 
validated; if it measures something else, it will 
be worthwhile to include it. 


Methodological Note 


The problem of testing to see whether two 
forms of a test are equivalent, as well as that of 
testing to see whether they are equally reliable, 
has received little attention in the published liter- 
ature on testing. Gulliksen (1:173) has suggest- 
ed the use of criteria proposed by Wilks (6) for 
testing the equality of means, variances, and co- 
variances of three or more forms of a test ad- 
ministered to the same group, and by Votaw (5) 
for testing the same hypothesis plus equality of 
validity coefficients for a given criterion. These 
techniques are not applicable to the present prob- 
lem. Only two forms are available; moreover, 
the two forms cannot be administered to the same 
individuals because of the identity in the content 
of the items in the two forms. Besides, these 
techniques fail to test the most important proper - 
ty of equivalent tests—that is, whether they meas- 
ure the same thing or not. 

The technique to be described and applied here 
provides for the testing of equality of means, var- 
iances, and reliabilities, and also whether or not 
the tests are homogeneous as to content. 

Many problems of long standing having to do 
with item writing—effects of changes inform, of 
response sets, etc. ,—are amenable to attack by 
this method. Since it has not appeared in the |it- 
erature before, it will be described in some de- 
tail here. 

In the discussion that follows, the completion 
form of the test will be referred toas Form A, 
and the multiple choice form as Form B. The two 
tests differ only in the modality or form of the 
items, the content being identical item for item. 

The group to whom the test was administered 


* This research was supported in part by the United States Air Force under Contract No. 18(600)-306 moni- 
tored by the Armament Systems Personnel Research Laboratory of the Air Research and Development Com- 
mand, Air Force Personnel and Training Research Center. Pcormission is granted for reproduction, trans- 
lation, publication, use 4nd disposal in whole and in part by or for the United States Government. 


«Now with the Division of Teacher Education of the Board of Higher Education, New York City. 


304 JOURNAL OF EXPERIMENTAL EDUCATION 


consisted of 52 students who had just completed 
their school training as mechanics. The class 
consisted of three shifts, and each shift com- 
prised two sections, so that there were six sec- 
tions in all. Both forms of the tests were given 
in each shift, one to a section. The two forms 
were assigned to the two sections on a shift at 
random 60 that any differences between the 
three shifts taking Form A and the three taking 
Form B would be random ones. This was nec- 
essary so that any real differences in the scores 
on the forms could be attributed to differences 
in the forms rather than in the groups taking 
them. 

The three sections taking Form A will be re- 
ferred to as Sections Al, A2, and A3; those tak- 
ing Form B, as Sections Bi, B2, and B3. Sec- 
tions Al, A2, and A3 consisted of 10, 10, and 
7 men, and Sections Bl, B2, and B3 of 9, 8, and 
8 men, respectively. There were thus 27 men 
in all taking Form A, and 25 taking Form B. 


Analysis of the Data 


Hoyt (2) has shown how the reliability ofa 
test may be estimated from an analysis of vari- 
ance of the item scores. Under the hypothesis 
that Form A and Form B are equivalent, item 
by item—that is, that the kt? item on Form A 
and the kth item on Form B are interchangeable, 
it is true that a 34-item test has been adminis- 
tered to 52 individuals. It is then possible to 
set up an analysis after the manner of Hoyt with 
the result shown in Table Il. The total variance 
of the 52 « 34 = 1, 768 item scores has been an- 
alyzed into three independent components, re - 
spectively due to individual differences, differ- 
ences in item difficulties, and error of meas- 
urement of the test. 

The reliability coefficient of the test under 
this hypothesis may be computed from the last 
column as follows: 


1.11918 - 0. 18459 


This coefficient is identical with that that would 
be obtained from the formula presented by 
Kuder and Richardson (4), and derived by them 
as the correlation between the test and a hypo- 
thetical parallel form. If the items in the test 
are considered as representing a universe of 
items, this reliability coefficient may be thought 
of as an estimate of the homogeneity of the item 
universe, and the error mean square as an esti- 
mate of the error in sampling the universe of 
items. And two tests may be considered to be 
equivalent if, in addition to having equal means 
and variances, they sample the same universe 
of items with the same error. 

Before one can test the hypothesis that the 


(Vol. 24 


two forms of the test are equivalent, the variance 
must be analyzed further. The comparisons 
among individuals, for which 51 degrees of free- 
dom are available, can be broken down into com- 
parisons among those who took Form A of the test, 
with 26 degrees of freedom, among those who 
took Form B, with 24 degrees of freedom, and 
one between the means of the two groups, with 
one degree of freedom. The 26 degrees of free- 
dom among individuals taking Form A can be 
broken down further into comparisons among in- 
dividuals within Sections Al, A2, and A3, with9, 
9, and 6 degrees of freedom, respectively, and 

a comparison among ihe three section means with 
2 degrees of freedom; and the 24 degrees of free- 
dom among men taking Form B canbeanal yzed 
into comparisons among individuals within Sec- 
tions Bl, B2, and B3, with 8, 7, and 7 degrees 
of freedom, respectively, anda comparison 
among the three section means with 2 degrees 
of freedom. If the hypothesis is true and the 
forms are equivalent, then none of these mean 
squares will be significantly greater than the rest. 
If the mean squares are not homogeneous, the hy- 
pothesis of equivalence must be rejected. 

The sum of squares due to error of measure- 
ment can similarly be analyzed and all of the 
mean squares so obtained will be equal if the hy- 
pothesis is true. Again, if they are notall equal 
the hypothesis of equivalence must be rejected. 
The complete analysis of variance in Table II re- 
sults. 

Of particular interest in Table Ilare the mean 
squares for difference between form means and 
for error between forms. These mean squares 
ill be used to test the equality of the means of the 
two forms, and whether or not the two forms may 
be said to sample the same item domain. 

The test of the hypothesis depends on the as- 
sumption that the error for each item score is 
normally distributed with zero mean and a var- 
iance that is equal for all items. 


Results 


Because of the large number of mean squares, 
their homogeneity was tested in groups. Al! of 
the components of the variance among individuals 
except that due to difference between forms were 
tested by Bartlett's test (3:83) and found to be hom - 
ogeneous (P > .50). It was, therefore, conclud- 
ed that the two forms have equal variances. The 
corresponding sums of squares were then pooled 
to give an estimate, based on 50 degrees of free- 
dom, of the variance due to differences in profi- 
ciency of men taking the same form of the test. 
This mean square may be referred to as ‘‘exper- 
imental error,’’ since it is equivalent to the 
‘‘within variance’’ in an analysis of variance of 
total test scores. 

A similar test was made of the homogeneity 


June, 1956) 


TABLE I 
ANALYSIS OF VARIANCE UNDER NULL HYPOTHESIS 


Sum of Mean 
Source of Variation .F. Squares Squares 


Differences among individuals 57. 07862 1.11918 


Differences in item difficulties 74. 18157 2. 24792 
Error 310. 67138 0. 18459 


Total 441. 93157 


TABLE Il 
ANALYSIS OF VARIANCE WITH ORTHOGONAL BREAKDOWN 


Source of Variation 


Difference between form means 
Differences between sections on Form A 
Differences between sections on Form B 


Variations within Section Al 
Variations within Section A2 
Variations within Section A3 
Variations within Section Bl 
Variations within Section B2 
Variations within Section B3 


onS 

ssssss oS 


Differences in item difficulties 


Error between forms 
Error between sections on Form A 
Error between sections on Form B 


Error within Section Al 
Error within Section A2 
Error within Section A3 
Error within Section Bl 
Error within Section B2 
Error within Section B3 


ser 


Total 


MEDLEY 305 

33709 
39740 
13316 
40817 
80555 
23249 
69689 
35242 
47462 
24792 
33 33. 11365 00344 
66 10. 26439 15552 
66 11. 49250 17412 
297 55. 62653 18729 
297 41. 65000 13967 
198 34. 60505 17477 
264 44. 20602 16743 
231 42. 15809 18250 
231 37. 55515 16257 
1,767 441. 93157 


306 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 24 
* 
a 
| 


June, 1956) 


of a'l of the error mean squares except that be- 
tween forms. No evidence of lack of homogen- 
eity was found (P > . 30), so the corresponding 
sums of squares were pooled to give a mean 
square for. ‘‘measurement error’’ with 1, 650 
degrees of freedom. This corresponds to the 
squared standard error of measurement of the 
test in the Kuder-Richardson sense. 

Readers familiar with modern experimental 
design will recognize this as a ‘‘split-plot’’ de- 
sign. 

It is now possible to condense the analysis of 
variance into the form shown in Table Il. The 
two tests already carried out indicate that the 
forms have equal variances and equal reliabil- 
ities; it remains only to decide whether they 
have equal means and whether or not they meas- 
ure the same thing. 

Before testing these portions of the hypothe- 
sis, a test of equality of the two errors was 
made. Experimental error in this case was 
found to be significantly greater than the er- 
ror of measurement of the test, indicating that 
the test in either form does nmieasure individual 
differences. The mean square for differences 
between form means was then compared with the 
‘‘within’’ or experimental error. The F-ratio 
is highly significant, indicating that the forms 
differ in average difficulty. 

The error between forms, when compared 
with measurement error, which is the error 
within forms, is also significant, indicating that 
although the two forms are equally reliable, the 
correlation between parallel items on different 
forms is smaller than the correlation between 
two items from the same form, on the average. 
The two tests cannot, therefore, be said to be 
measuring the same ability. 

The estimate of the reliability within forms 
is 


Since both forms are equally reliable, this is 
the best estimate of the reliability of either 
form. The best estimate of the standard devia- 
tion of either form is 4. 26. 

The mean number of items failed on Form B 
was 12; the mean number of items failed on 
Form A was 21. It is apparent that when the 
items are presented in completion form they 


MEDLEY 307 


are much more difficult than when presented 
in multiple-choice form, as wellas meas- 
uring a different ability. 


Summary and Conclusions 


6. Wilks, 8S. S. 


Two forms of a test with items identical as 
to content but different in modality, Form A hav- 
ing items in completion form and Form B items 
in multiple-choice form, were administered to 
52 graduates of a training course. Form A was 
administered to 27 students, Form B to 25. An 
analysis of variance of the item scores indicated 
that the two forms could not be regarded as equiv- 
alent; their variances and reliabilities did not dif- 
fer significantly, but their means did (Form A 
being significantly more difficult than Form B), 
and the item universes sampled by the two tests 
also differed significantly. 

These results indicate that the form in which 
items are presented to the individual being tes t- 
ed may have an effect on what the items actually 
measure, even though their manifest content 
does not differ. 


REFERENCES 


1. Gulliksen, H. L. Theory of Mental Tests 
(New York: Wiley 8, 1950). 


2. Hoyt, C. J. ‘‘Test Reliability Estimated by 
Analysis of Variance,’’ Psychometrika, VI 
(1941), pp. 153-160. 


3. Johnson, P. O. Statistical Methods in Re- 
search (New York: Prentice- 


4. Kuder, G. F., and Richardson, M. W. “The 


Theory of the Estimation Test Reliability, ’’ 
Psychometrika, II (1937), pp. 151-160. 


5. Votaw, D. F. ‘‘Testing Compound Symmetry 
in a Normal Multivariate Distribution, ’’ An- 
nals of Mathematical Statistics, XIX (1948), 
pp. 447-473. 


‘‘Sample Criteria for Testing 
Equality of Means, Equality of Variances, 
and Equality of Covariances ina Normal 
Multivariate Distribution,’’ Annals of Mathe- 
matical Statistics, XVII (1 » Pp. 


‘ 


POWER OF THE RANK, MEDIAN AND RUN 
TESTS WHEN TIES ARE NUMEROUS: 
EMPIRICAL STUDY 


MERLE W. TATE 
University of Pennsylvania 


ONE NEED only to look at the statistical lit- 
erature of the past decade to see that nonpara- 
metric or distribution-free test theory has de- 
veloped apace and that there are now available 
nonparametric analogues for most of the stand- 
ard parametric tests. 

Apart from expediency and simplicity, non- 
parametric tests of differences betweea two 
samples have several advantages. They are re- 
liable in the sense that they hold the probability 
of rejecting a true hypothesis to the spec ified 
level (or less, depending on the discreteness of 
the sampling distribution), whatever the form 
of the parent population. They can be applied 
to data available only in ranks. They can be ap- 
plied to data where transformation (and possi- 
ble distortion of the meaning of the measures) 
would be necessary before application of para- 
metric test. 

On the other hand, the tests have noteworthy 
limitations. They tend to be considerably less 
efficient than parametric tests and, asa rule, 
they permit only crude estimation, if any at all. 
In the case of several, notably the runtest, the 
assumption of continuity appears to be crucial. 
All of them are clouded to some extent by dis- 
continuous data or ties. None of them test loca- 
tion, independent of shape and spread, although 
both the median and rank tests appear to be sen- 
sitive mainly to differences in location. As a 
further limitation, most of the methods, al- 
though intuitively simple, are mathematically 
complex. Consequently, their power functions 
are difficult to study analytically; in the case of 
tied data, the difficulty is very great. 

Practically speaking, we seldom have contin- 
uous data. Ties are generally present, and al- 
though they are easily rationalized as the result 
of crude measurement, the rationalization does 
not lessen their nuisance or their possibly dam- 
aging effect on nonparametric tests of signifi- 
cance. 

The question of the effects of ties on the ranks*, 
the median, and the run test led the writer to 


study the matter empirically. A normal popula- 
tion of 400 two-digit scores, range 11-71, mean 
40, and standard deviation 10, was construc ted, 
and from it 100 pairs of samples of size 16 and 
9 were drawn at random.** It was believed that 
this simulated a great many situations in educa- 
tional work and that ties would arise at least as 
often as they do in practice. 

The distribution of the 100 differences be- 
tween the means of the samples are shown in 
Table I. The mean of the distribution is .03 and 
the standard deviation, after Sheppard’s correc- 
tion, 4.91. The distribution is more variable 
than expected, the standard error of the differ- 
ence being, in this case, (100/16 + 100/9)'/2 or 
4.17. The distribution of the 100 variance ratios 
was well fitted by an F curve, with P from x? 
about .20. Except for yielding more variable dif- 
ferences than expected, the samples appeared to 
be typical. 

The normalt, Fisher’st, the rank, the 
median, and the run tests were applied to the 
pairs of samples under four situations, (1) the 
differences as observed, (2) the differences af- 
ter 4 was added to each of the original scores 
in the samples of 9, (3) the differences after 8 
was added, and (4) the differences after 12 was 
added. 

Two-sided critical regions were used, with 
levels of significance at .066, .010, and .001. 
The . 066 level was chosen instead of the . 05 be- 
cause the probabilities on the median test for 
samples of 16 and 9 are at the points .000, .008, 
055, .185, .312, .277, .129, .030, .003, and 
.000. The median test would thus be very weak 
at the .05 level. For the run test, the probabil - 
ity of 9 or fewer runs is . 146; that of 8 or few- 
er, .055. At the .066 level, the critical value 
of the sums of ranks in the sample of 9 is, to 
the nearest integer, 149 or more or 85 or less. 
In addition to lack of coincidence of critical 
points, the tests are not strictly comparable on 
a second count. The run test is aimed at arb- 
itrary deviations from the hypothesis, while the 


* The rank test is sometimes called the Wilcoxon T test and sometimes the Mann-Whitney U test. Wilcoxon 
devised the test ani tabulated the critical sums of ranks in samples of equal sige in 1945; Mann and 
Whitney generalized the test to samples of unequal size in 19l)7, ami showed the test to be consistent, 


**The author is indebted to Miss Sheila Aikens of the University of Pennsylvania for most of the samp- 
ling arm! computational work ani for the preparation of the figures. 


JOURNAL OF EXPERIMENTAL EDUCATION , (Vol. 24 


TABLE I 


DISTRIBUTION OF DIFFERENCES BE TWEEN 
MEANS OF SAMPLES 


TABLE II 


EMPIRICAL TWO-SIDED POWER FUNCTIONS AT .066, .010, AND .001 LEVELS OF FIVE 
STATISTICAL TESTS WHEN TIES ARE NUMEROUS 


Rejection of null hypothesis in 100 comparisons of two samples of 


sizes 9 and 16 


Normal 
Diff. between Level of t Normal Fisher’s 
pop. means signif. (expected) t t 


. 066 7 10 11 
.010 3 2 
.001 0 0 


21 


310 
Midpoint Frequency 
1 
0 
2 
2 
2 
6 
10 
ll 
13 
: 10 
- 8 
- ll 
7 
7 
° 5 
° 1 
3 
ee Rank Median Run 
0 10 6 2 
2 0 0 
0 0 0 
: 4 . 066 19 23 | 24 17 10 
(apx. 1 op) 010 5 7 8 8 3 2 
ie . 001 1 3 1 2 1 0 
8 . 066 53 57 56 54 34 23 
(apx. 2 op) .010 25 29 22 24 12 5 
001 8 ll 11 9 3 2 
12 . 066 85 78 17 74 63 38 
(apx. 3 op) .010 62 61 56 54 32 18 
. 001 34 36 29 25 9 9 


Normal t 
(expected ) 
Normal t 
Fisher's t 
Hank test 


8 


Figure 1 
- Power Curves When Level of Significance is .066 (From Table I) 


June, 1956) TATE 311 

100 

; Median test 

60 

40 Run test ' 

0 L | 


JOURNAL OF EXPERIMENTAL EDUCATION 


Normal t 
(expected) 


Normal t 
Fisher's t 
Rank test 


Figure 2 
Power Curves When Level of Significance is .01 (From Table II) 


312 ee (Vol. 24 

100 

a0 

40 

Median test 
20 
ZY Run test 
: 
4 8 12 


June, 1956) 


other four are aimed at two-sided alternatives. 
However, the tests are ordinarily used and in- 
terpreted about as they are here. 

The nonparametric tests have recently re- 
ceived treatment in several textbooks and noth- 
ing needs to be said about their application be- 
yond noting the method of dealing with cross 
ties, i.e., ties affecting both samples. In the 
rank test, the average of the serial ranks tied 
for was assigned to each of the ties; in the med- 
ian test, ties occurring at the median score 
were broken at random; and in the run test, tied 
scores were ordered at random. 

The extent of ties and the number of pairs of 
samples affected are shown below: 


No. of When true difference was: 
cross ties 0 4 8 12 


10 
29 
34 
20 
7 
0 


The results of the tests are shown in Table 
II, the entries in the third column being the 
number of rejections expected from the normal 
t test. 
~ The power functions are plotted in Figures 1 
and 2. It will be noted that the rank test is, in 
this situation, nearly as powerful as the para - 
metric tests. The run test is markedly inferior 
to the others. This would be expected both be- 
cause the run test is no more a test of location 
than of spread and because it rests more heavily 
upon the assumption of continuity. 

Theoretical studies indicate that the difference 
in power between the rank test and the t test for 
continuous data is small in both large and small 
samples. The experimental results reported 
above suggest that the similarity holds for tied 
data. The rank test appears to be well worth a 
place in applied statistics. 


TATE 

0 0 0 2 8 — 

1 5 22 30 

2 25 23 34 

3 33 27 14 : 

4 23 17 7 

5 12 vi 7 

6 2 2 0 


. 
— 


FACTOR ANALYSIS OF HIGH SCHOOL VARI- 
ABLES AND SUCCESS IN UNIVERSITY SUB- 
JECTS FOR THE FIRST SEMESTER IN 
THE UNIVERSITY 


ELIZABETH C. BAKER and GEORGE A. BAKER 
University of California at Davis 


Introduction and Summary 


A PREVIOUS study (3) has been made of the 
relationships between high school scholastic ex- 
perience and success in the first semester in 
the University of California at Davis. In this 
previous study, success in the first semester 
in the University was measured in four different 
ways all of which turned out to be practically 
equivalent. These four variables were based 
on courses undertaken by the students during 
their first semester. All students did not take 
identical programs. If high school experience 
prepares for some courses better than for 
others then part of the failure to predict Univer- 
sity success from high school experience com- 
pletely is due to the fact that the programs of 
the students are not identical. 

Thus, it becomes important to study success 
in each subject in terms of preceding high school 
experience to see to what extent success is e x- 
pressible in terms of high school performance. 
In this study it is found that the different com- 
mon first semester University courses differ 
markedly in their profiles with respect to a set 
of typical high school variables, and, also, with 
respect to the extent to which high school vari- 
ables determine or measure success in the var- 
ious courses. 


Variables 


The subjects required for admission to the 
University of California, known as the a-f re- 
quirements, are: 


. U. S. History - 1 unit 

. English - 3 units 

. Mathematics - 2 units in algebra, ge- 
ometry, or trigonometry 

. Laboratory science - 1 unit 

. Foreign language - 2 units in the same 
language 

. Additional advanced course - 1 (or 2) 
units in mathematics, foreign lang- 
yage, chemistry or physics 


Eleven variables (nine high school, total Un- 
iversity units, and U.C. differential) were con- 
sidered in connection with the grade point aver- 


age for each University subject for which suffi- 
cient data were available to make an analysis at 
all worthwhile. The variables are definedas fol- 
lows: 


1. 2V23 - Total University units including Sub- 
ject A (non-credit English). 

2. Vs - The University of California differ- 
entia) computed for each high school 
on the basis of all students regularly 
admitted from the school for the past 
five years as the difference between 
the average made by these students 
their first semester inthe University 
and the entering average basedon the © 
subjects used to satisfy the a-f re - 
quirements taken in the last three 
years of high school. 

Grade point average on a-f require- 
ments for the last three years. 
Grade point average on acceptable 
English 
Grade point average on third and 
fourth year English, mathematics, 
laboratory science, foreign langu- 
age, and history or social science 
including United States history. 

- 3V2g - Grade point average on all foreign 
language. 

. 3Vag - Grade point average on all history. 

. 1¥13 - Grade points on acceptable mathe- 
matics. 

. V3 - Grade point average on acceptable 
mathematics. 

. 1¥14 - Grade points on acceptable labora- 
tory science. 

. 3Y14 - Grade point average on acceptable 
laboratory science. 


The University subjects considered are Chem- 
istry 1A, Subject A, English 1A, Zoology 1A, 
Mathematics D (intermediate algebra), Psychol- 
ogy 1A, Animal Husbandry 7 (survey of animal 
production), and Decorative Art 6A (design and 
color). 


Data 


The data consist of information on 178 students 
who entered the University of California at Davis 


sjuapnys 
jo 


yoafqng 


v9 L vI vI vI 
Aapueqsny sonew 4301007 ystaug ywelqng 


GVOT FOATION TYLOL GNV SPIGVIYVA TOOHIS HOIH O01 ¥Od 
V 40 NI SLOSPENS ALISUZAINN LHOIG 40 LNIOd 


316 ee (Vol. 24 

3 
| 

| 2 
8 
se = 

| 

8 

| & & 

sino 2:38 


June, 1956) 


directly from California high schools in the fall 
of 1953. The numbers of students taking the Un- 
iversity courses listed above varied from 91 to 
44. It was not felt that analyses based on fe w- 
er students would be defensible. It is admitted 
that even in the subjects considered the num- 
bers are rather small but itis thought that 
the contrasts noted are striking enough to be pub- 
lished and are sufficient to illustrate the me th- 
od of analysis which is believed to be novel. 


Analysis of the Data 


The students among the 178 who took each of 
the eight college courses were listed in turn 
with the scores in these courses and the values 
for the eleven variables detailed above. The 
scores for grades A, B, C, and Dor F are 3, 

'2, 1, and 0 respectively. All possible correla- 
tion coefficients were computed giving rise to a 
12 x 12 correlation matrix for each college sub- 
ject. A factor analysis was carried through to 
three factors in each case as detailed by Cattell 
(4), Holzinger and Harman (5), and Thurstone 
(6). Three factors were sufficient to reduce 
the elements of the residual matrices. practical- 
ly to zero in every case. 

The 3 x 12 matrices were then rotated so that 
all of the determinations of the college subject 
occurred in the first column. This was accom- 
plished by using Lagrange multipliers and deter- 
mining the F, column of the \-matrix (see ref., 
4, p. 195) so that the weighting of the college 
subject on the first factor (F{) is as large as 
possible. For similar rotations of the factor 
matrix see Baker (1,2). The weightings of the 
other variables on the first factor (F{) of the ro- 
tated factor matrix indicate the importance of 
these variables in connection with the college 
variable. The size of the college variable 
weight indicates its over-all connection with the 
other variables. The determinability of the col- 
lege variable is measured by the square of the 
first factor weight and its uniqueness or inde- 
pendence of the other variables is measured by 
1-(square of the first factor weight). Thus, 25% 
of the performance in English 1A is due to some- 
thing commom to it and the other variables while 
75% is due to something specific to English 1A as 
given at the University of California at Davis. 

The details of the profiles of the eight col- 
lege subjects for the eleven variables cons id- 
ered are given in the table. 


Discussion 


If we examine the factor weightings for the 
college subjects in the table, we see that Subject 
A, English 1A, Animal Husbandry 7 and Dec- 
orative Art:6A have little in common with the 
other variables, i.e., the other variab les ac- 


317 


count for about 25 to 31% of the variation in the 
performance in these college subjects. The col- 
lege subject graded the most like high school 
subjects is Psychology 1A (69%). Chemistry1A 
(45%), Zoology 1A (56%), and Mathematics D 
(50%) are very similar in their connection with 
high school variables. 

There are some striking contrasts between 
weightings on individual variables for the differ- 
ent college subjects. The U. C. differential (V9 
has similar weights for Chemistry 1A, Zoology 
1A, and Mathematics D, but seems to be of little 
importance for the others. Grade pointaverage 
on a-f requirements (3Vj9) is especially high 
for Psychology 1A, Animal Husbandry 7, and 
Decorative Art 6A. High school history (3V29) 
is high for English 1A but is zero for Subject A. 
Laboratory science (3V14) seems to be of some 
importance for all college subjects except Eng- 
lish 1A and Subject A and to be especially import- 
ant for Animal Husbandry 7. Grade point aver- 
age on foreign languages (3V2g) seems to be more 
important for Subject A than for English 1A and 
seems to be quite important for Animal Husband- 
ry 7, Psychology 1A, Decorative Art 6A and 
Mathematics D. Similar interesting variations 
could be noted further. 

There are two main points to be noted. First, 
college subjects differ widely in their connec - 
tions with performances measured and recorded 
for students at high school age. Evidently, teach- 
ers in college English look for and prize some- 
thing that is not valued at the high school level or 
perhaps is not even present in a measurable de- 
gree. This thing may be of emotions, or close- 
ly related thereto. All of the teachers of college 
courses seem to look for something different 
from high school—perhaps maturity, ability to 
grow and develop, and so on. 

Second, the weightings of the differentcollege 
subjects on the individual high school variables 
show great variation. Thus, high school history 
seems to be of considerable importance for Eng- 
lish 1A but not for Chemistry 1A, Subject A, and 
Zoology 1A. 

The implications of these suggested findings 
are far reaching and of great importance from 
the standpoint of preparing for post high school 
work and the admission of high school graduates 
into institutions of higher learning. Except for 
the complications of an administrative nature, 
perhaps the admission requirements for courses 
at an institution should be stated instead of the 
admission requirements to the institution. 

Further, it must be clear that the success of 
a student in the University depends very muchon 
the program that he elects to take. Thus, any 
attempt to predict success in the University 
which does not consider the actual courses taken 
is doomed to partial failure. It is also possible 
that a particular course is not altogether defined. 


Thus, the same course given by different people 
at different times may be essentially different. 
It is difficult to secure information on such mat- 
ters. 


REFERENCES 


1. Baker, G. A. ‘‘Organoleptic Ratings and 
Analytical Daw for Wines Analyzed into 
Orthogonal Factors,’’ Food Research, XIX 
(1954), pp. 575-580. 


2. Baker, G. A. ‘Factor Analysis of Relative 
Growth,’’ Growth, XVIII (1954), pp. 137- 
143. 


JOURNAL OF EXPERIMENTAL EDUCATION 


(Vol. 24 


3. Baker, Elizabeth C., etal. ‘‘Factor Analy- 
sis of High School Scholastic Experience 
and Success in the First Semester at the 
University of California at Davis,’’ College 
and University, XXX (1955), pp. 351-358 


4. Cattell, Raymond B. Factor Analysis (New 
York: Harper and Brothers, 1951), 475 pp. 


5. Holzinger, Karl J., and Harman, Harry H. 
Factor Analysis (Chicago: University of Chi- 
cago Press, 1941), 429 pp. 


6. Therstone, L. L. Multiple-Factor Analysis 
(Chicago: University of Cicags Press, 


1947), 552 pp. 


A COMPARISON OF THE ACCURACY OF THE 
FORMULA FOR THE STANDARD ERROR 
OF PEARSON “r” WITH THE ACCURACY 
OF FISHER’S z-TRANSFOR MATION 


HARVEY F. DINGMAN 
University of Southern California 
NORMAN C. PERRY 
Alabama Polytechnic Institute 


ESTIMATES BASED on the approximate 
standard error formula or = (1 - r?)/VN - lare 
used in education and psychology (3), for the 
purpose of rendering a judgment on the samp - 
ling significance of an obtained product-moment 
correlation. Fisher’s normalizing transforma- 
tion z' = logev(1 +r (in connection with . 
the formula oz: = 1/VN - 3) isalsoused for the 
same purpose (2). Neither of these processes 
is completely accurate (4) and since a conven- 
ient basis for study is afforded by the extensive 
and exact tabulation of F. N. David (1), it is the 
purpose of this paper to present in tabular form 
a numerical comparison of the error in these 
two methods. 

The Fisher process is more elaborate, but 
on mathematical grounds is known to be, ingen- 
eral, of greater accuracy. Aside from theoret- 
ical interest, therefore, the main pur pose of 
our tabulation is to yield more precise informa- 
tion as to when it is safe to use the standard er- 
ror formula and when it is better to use the 
more complex procedure of the z-transforma- 
tion. 


Plan of Tabulation 


The numerical comparison to be de v eloped 
consists of a consideration of sampling limits 
derived by the various methods for selected 
combinations of population correlation p and 
sample size N. These limits are computed for 
+ lo values, and + 1. 960 values, for values of 
p by tenths from 0.0 to 0.9. 

We now illustrafe the method of comparison 
to be employed for the case of a sample size N 
= 5. Sampling limits one standard deviation 
above the mean are indicated for each p, as de- 
termined by all three methods. (See Table I.) 

Inspection of these results shows that the z' 
process is in error by the amounts 04, 03, 01, 
01, 00, 00, 01, 00, 00, and 01. Hence the av- 
erage amount of discrepancy is readily calculat- 
ed to be .011. Foror the successive errors 
are 07, 04, 03, 00, 01, 03, 03, 04, 04, 02, and 
the average discrepancy is .031. 

A corresponding set of computations for the 


lower lo limits yields an average error of .134 
for the Fisher method but an error of .086 for 
the 0, formula. 

For the upper limit at the 5% level the Fisher 
method is in error by only .006, the or by . 201. 
For the lower 5% limit the Fisher error is .103, 
the or . 279. 

This example suggests certain trends which 
are extensively investigated in Table II. The lat- 
ter presents in summarized form a development 
similar to the above for sample sizes N = 3, 4, 
5, 6, 7, 8, 10, 12, 15, 20, 25, 50, 100, 200, and 
400. 


Summary of Significant Trends in 
Tabular Results 


An examination of Table II leads to several 
conclusions useful in the practical application of 
and Fisher’s approximate methods. 

If .01 is arbitrarily selected as the maximum 
allowable average error for use in psychology, it 
will be noted that for upper lo limits the sma1l- 
est sample size which should be used with o,f is 
N=19. For the z' method and a similar criter- 
ion of accuracy, the minimal sample size for up- 
per lo limits is N= 6. For lower lo limits the 
smallest sample size is N= 22. Itis obvious 
from these conclusions that for upper lo limits 
the Fisher method is more accurate than the or 
formula. For lower lo limits, #e‘or method is 
superior. 

However, at the 5% level of confidence, where 
one is working with limits 1. 960 above and below 
the mean, the Fisher method is superior in ali 
instances. In particular, the z' process gives 
results accurate to within .01 at the upper limit 
for sample size N = 4, at the lower limi. for 
sample size N= 17. In contrast the oy method 
requires samples larger than 50 for .01 accur- 
acy at the lower limit and samples larger than 
100 at the upper limit. 


Summary 


In the previous sections a comparative study 
was made of the error in the o and Fisher meth- 


612° 0st 96 ‘I- 
900° 

182° 96 ‘I+ 
bet” 

220° 

10° 


| 8388 


SV OML Ad SV N 
AONAGIANOD JO 996 GNV 00°l 40 NI YHOUNT ZOVURAV 


ll 


40112,, aq [TMA WOT; UOT} 

Z 
‘I- 1 = 40 


< 
8 
a 
z 
& 
x 
° 


$ = N FHL 4Od 
AG SV S.NOSUVEd JO SEN TVA 40 


I 


g| 
-| 
-| 3 -|sszezese 


June, 1956) 


ods for approximating the confidence limits of 
correlation coefficients. The tabuiation of F. 
N. David was used as a standard of comparison. 

As indicated by theory, the Fisher method 
was more accurate in every instance for the 5% 
level of confidence. For limits one standard de- 
viation in each direction, however, the Fisher 
method, while still superior at the upper limit, 
turned out to be less accurate than the or form- 
ula for lower limits. 

Both methods were shown to be less accurate 
for lower limits than for upper limits. This 
empirical finding is in agreement with the known 
skewing of the sampling distribution of 
the correlation coefficient for non-zero 
population correlations. 


DINGMAN - PERRY 


REFERENCES 


. David, F. N. Tables of the Correlation Co- 
efficient (Cambridge, England: Uni versity 
Press, 1938). 

2. Fisher, R. A. Statistical Methods for Re- 
search Workers (New York: ner Publish- 
ing Co., 1948). 

3. Guilford, J. P. Fundamental Statistics in 
Psychology and Education (New York: Mc- 
Graw-Hill Book Co. , 1950). 

. Hotelling, Harold. ‘‘New Light on the Cor- 
relation Coefficient and Its Transforms, "’ 


Journal of the Royal Statistical Society, 
Series B, Vol. a (1953), pp. 193-232. 


‘ 


A PREDICTIVE CONFIDENCE INTERVAL FOR 
THE VALIDITY COEFFICIENT 


R. B. McHUGH 
Iowa State College 
Ames, Iowa 


IN RECENT discussions of test theory and 
practice, some attention has been focused on the 
issues involved in cross-validation. For exam- 
ple, in the Technical Recommendations of the 
APA (1: c. 11), it is termed ‘‘essential’’ that, 
after having used the criterion data from the or- 
iginal sample in order to select and weight test 
items, the test constructor draw a new sample 
for estimation of the relationship between test 
and criterion. This use of a fresh sample thus 
permits an estimate—both point and interval— 
of the population validity coefficient p of the test 
for the criterion, which is not subject to the pos- 
itive bias existing in the estimate of p obtained 
from the original sample alone. 

For example, * after item analysis on a pre- 
liminary sample, a new sample of n, = 114 San 
Diego gunner’s mate trainees yielded a correla- 
tion of r, = .50 between the Navy GCT andacri- 
terion of grades in gunner’s mate school. In ad- 
dition to this point estimate, r, = . 50, of the pop- 
ulation validity coefficient p, an approximate 
1 - a percent confidence interval estimate may 
be obtained for p. As is well known (ref., 2), 
this is accomplished by using Fisher’s Z, trans- 
formation 


Zr = $loge (1) 


l-r 


and by using the approximate standard deviation 
of Zr 


(2) 


and the distribution of the normal random vari- 
able, Z, as follows. Approximate lower and up- 
per 1 - a confidence limits for p are, respec- 
tively, the inverse Zy transforms of 


Zr, + Za/a °Zr, ’ Zr, + Z,-a/29Z,, (3) 
As illustration, suppose that 99 percent lim- 


its are desired for the present data. Then the 
r, of .50 transforms via (1) to Zy, = 59 = .55. 


And here from (2), 00. 


Since for a = 1 percent, the normal deviates are 


Za /2 -2. 58 and Z,-a/a = 299.5%, z 


2.58, then the 1 - a = 99% confidence limits on 
p consist of the inverse Zr transform of . 55 + 
(2. 58)(. 095), i.e., the inverse of .31, .79 which 
is 


[.30, . 66] (4) 


However, even if the test developer has ac - 
complished the important step just illustrated 
of cross-validation, there remains a further 
issue which—though apparently customarily over- 
looked—deserves explicit consideration. This is 
the fact that the foregoing inference is, in the con- 
venient terminology of Tintner (ref., 5), an in- 
verse inference, whereas what is frequently re- 
quired in practice here is a predictive inference. 
That is, inverse inference is “inference from 
the sample to the whole population’’ (ref., 5)—e.g., 
inferring p from r, or from the interval, (3), 
based on r,, as illustrated above. On the other 
hand, predictive inference is the ‘‘ inference 
from a given sample to another future unknown 
sample.’’ For example, given the sample valid- 
ity coefficient r,, the problem of inferring an in- 
terval estimate of r,, the validity coefficient in 
a future unknown sample, is a problem of predic- 
tive inference. This latter problem is often the 
more important problem from the practitioner's 
viewpoint, viz., in applicational terms: how use- 
ful can the test be expected to be in assessing 
the criterion in a future sample? 

The problem then may be formulated as {ol- 
lows: Given a first (cross-validation) sample of 
size n, and validity coefficient r,, to obtain a 
predictive 1 - a percent confidence interval 
(not for p but) for the validity coefficient r, ob- 
tained from a future unknown sample of size n, in- 
dependently drawn from the same bivariate norm - 
al parent population. The answer, implicit inthe 
work of Kitagawa (ref.,3), is that upper and low- 
er 1 - a percent limits for r, are, respectively, 


*Thanks are due to Dr. William A. Owens, Iowa State College, for supplying the present data which are 


subsumed in the summary by Stuit, et al (ref., i 


| 

— 
| 
| 

| 


324 JOURNAL OF EXPERIMENTAL EDUCATION 


the inverse Zr transforms of Zr, + Za/, 97, 
i 


+ Zr, Zr, + Z,-a/, + Zr, (5) 


where, since the samples are independent, the 
standard deviation is 


0 + = 1 + 1 6 

The analogy with (3) is obvious; only the varia- 
bility and hence the interval length has increased, 
showing that here predictive inference is more 
hazardous than inverse inference. 

As an example of (5), consider the San Diego 
data again where n, = 1l4andr, = .50. Sup- 
pose a 99 percent predictive confidence inter- 
val for r, based on n, = 107 is desired. Here 


+ Zr, = fr + = .136. Hence the 


approximate 99 percent interval is the inverse 
Zr transform of . 55 + (2. 58)(. 136), i.e., the in- 
verse of .20, .90 which is 


[.20, . 70} (7) 


which should be contrasted with (4) above. 
Some evidence of a substantiating nature is 
provided by the fact (cf. footnote *) thatanother 
actual sample of n, = 107 gunner’s mate train- 
ees from the San Diego installation yieldeda sam - 
ple validity coefficient of r, = .21 which is in- 


(Vol. 24 


cluded in the limits of the estimate (7) above, 
but not in (4). 


REFERENCES 


. American Psychological Association. ‘‘Tech- 
nical Recommendations for Psychological 
Tests and Diagnostic Techniques, ’’ supple- 
ment to Psychological Bulletin, No. 51 
(1954). 


. Johnson, Palmer O. Statistical Methods in 
Research (New York: Prentice-Hall, Inc., 


pp. 


. Kitagawa, Tosio. ‘‘Successive Process of 
Statistical Inference,’’ Memoirs of the Fac- 


ulty of Science, Kyusyu University, Series 
A, 5 No. 2 (Fukuoka, Japan: Kyusyu Uni- 
versity, 1950). 


. Stuit, D. B., etal. Personnel Research 
and Test Development in the Bureau of Nav- 
al Personnel (Princeton, N. J.: Princeton 
University Press, 1947), 513 pp. 


. Tintner, Gerhard. ‘‘Foundations of Proba- 
bility and Statistical Inference, ’’ Journal of 


Royal Statistical Society, Series A (General) 
ie Part Ill (1949), pp. 251-286. 


1949) 
4 
| 


RATE CARD 


Beginning September 1, 1956, the New Rates for the JouRNAL 
OF EXPERIMENTAL EDUCATION will be: 


Subscriptions—1 Year - 


2 Years 


All orders received before September 1, 1956 will be 
accepted at the old rates. 


Foreign Rates will be 50 cents extra each year for postage. 


Single copy prices will remain $1.50 each. If the remit- 
tance is included with orders for the single copies, we will 
pay the postage. 


DemBar PUBLICATIONS, INC. 

JOURNAL OF EXPERIMENTAL EDUCATION 
P. O. Box 737 

MApiIson, Wisc. 


Specifications for Manuscripts 


JOURNAL OF EDUCATIONAL RESEARCH 
and the... 


JOURNAL OF EXPERIMENTAL EDUCATION 


1. All manuscripts must be typewritten, double spaced, and on one side 
of the sheet only. Mimeographed and ditto sheets are acceptable only 


when very clearly printed. 


2. All unusual symbols or formulae must be very clearly typed or hand 
printed in black ink. To avoid costly printers’ composition charges it 
may be necessary for us to make cuts of difficult matter, or to print 
your material by the photo-offset lithography method. The latter means 
photographing your actual copy. It is expensive to have material re- 
drawn by our own artists, and retracing or duplicating increases the 
hazards of error. See that your copy is correct and complete as you 
wish to have it reproduced. The men who work on your manuscripts 
your technical field. 


drawings, graphs or other illustrated materials,—they must be neatly 
done, in black ink, on bond paper or tracing cloth suitable for repro- 
duction. Remember our magazines are printed in black ink only. Color 
graphs should be changed by the author to provide different kinds of 
shading for the different areas. For example: diagonal lines for red, 
vertical lines for blue, etc. Provide a key. 


4. All tables, graphs, etc., on sheets by themselves must be properly labeled 
and identified in relation to the written copy of the manuscript. 

6. Footnotes must be complete as to author, title, place of publication, 
publisher, date and pages. They must be numbered consecutively 
throughout the article. 

6. Bibliographical notes must be complete and arranged alphabetically. 


The cooperation of all prospective authors in following these rules is 
earnestly required. It is difficult to produce technical journals accurately, 
neatly, and on time under the best conditions. Promptness in printing, 
economy, and accuracy will be promoted by carefully prepared manuscripts. 


oo foe the... 


